Page 1 of 2

Infinite mutex lock wait in "convert"

Posted: 2013-01-17T10:46:59-07:00
by lobster_johnson
Stack trace from gdb:

Code: Select all

#0  0x00007f98d5fc889c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f98d5fc4065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f98d5fc3eba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f98d6683be5 in DestroySemaphoreInfo () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#4  0x00007f98d65fe7dc in DestroyLinkedList () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#5  0x00007f98d65e1b12 in DestroyExceptionInfo () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#6  0x00007f98d6616707 in GetLocaleMessage () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#7  0x00007f98d65e1cf1 in GetLocaleExceptionMessage () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#8  0x00007f98d65e24fc in ThrowMagickExceptionList () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#9  0x00007f98d65e2287 in ThrowMagickException () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#10 0x00007f98d6683cb1 in DestroySemaphoreInfo () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#11 0x00007f98d661b105 in MagickCoreTerminus () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#12 0x00007f98d65e19ed in ?? () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#13 0x00007f98d65e2130 in CatchException () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#14 0x00007f98d65211a3 in ?? () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#15 0x00007f98d6521df2 in QueueAuthenticPixelCacheNexus () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#16 0x00007f98d65411d1 in QueueAuthenticPixels () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#17 0x00007f98d28710e7 in ?? () from /usr/lib/x86_64-linux-gnu/ImageMagick-6.8.0/modules-Q16/coders/jpeg.so
#18 0x00007f98d657721c in ReadImage () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#19 0x00007f98d657772b in ReadImages () from /usr/lib/x86_64-linux-gnu/libMagickCore.so.6
#20 0x00007f98d6215c80 in ConvertImageCommand () from /usr/lib/x86_64-linux-gnu/libMagickWand.so.6
#21 0x00007f98d6282b69 in MagickCommandGenesis () from /usr/lib/x86_64-linux-gnu/libMagickWand.so.6
#22 0x00000000004007e7 in ?? ()
#23 0x00007f98d5c1c76d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#24 0x0000000000400839 in ?? ()
#25 0x00007fffc7c2b698 in ?? ()
#26 0x000000000000001c in ?? ()
#27 0x000000000000000a in ?? ()
#28 0x00007fffc7c2cb95 in ?? ()
#29 0x00007fffc7c2cb9d in ?? ()
#30 0x00007fffc7c2cbaa in ?? ()
#31 0x00007fffc7c2cbb2 in ?? ()
#32 0x00007fffc7c2cbba in ?? ()
#33 0x00007fffc7c2cbc3 in ?? ()
#34 0x00007fffc7c2cbc7 in ?? ()
#35 0x00007fffc7c2cbcd in ?? ()
#36 0x00007fffc7c2cbd7 in ?? ()
#37 0x00007fffc7c2cc12 in ?? ()
#38 0x0000000000000000 in ?? ()
Command invoked is:

Code: Select all

env MAGICK_TIME_LIMIT=30 MAGICK_THREAD_LIMIT=1 convert ...
Happens all the time, randomly, the exact image does not seem to matter. We are using MAGICK_THREAD_LIMIT=1 because otherwise it hangs or crashes a lot more often.

Environment is Ubuntu Precise, 64-bit, and:

Code: Select all

$ convert --version
Version: ImageMagick 6.8.0-1 2012-10-19 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2012 ImageMagick Studio LLC
Features: OpenMP

$ uname -a
Linux [...] 3.3.4-bengler #3 SMP Tue Jul 31 14:59:36 CEST 2012 x86_64 x86_64 x86_64 GNU/Linux

$ dpkg -l "*magick*" | grep "^i"
ii  imagemagick                               8:6.8.0.1-1bengler1                 image manipulation programs
ii  imagemagick-common                        8:6.8.0.1-1bengler1                 image manipulation programs -- infrastructure
ii  libmagickcore-dev                         8:6.8.0.1-1bengler1                 low-level image manipulation library - development files
ii  libmagickcore5                            8:6.8.0.1-1bengler1                 low-level image manipulation library
ii  libmagickcore5-extra                      8:6.8.0.1-1bengler1                 low-level image manipulation library - extra codecs
ii  libmagickwand-dev                         8:6.8.0.1-1bengler1                 image manipulation library - development files
ii  libmagickwand5                            8:6.8.0.1-1bengler1                 image manipulation library
We have rolled our own packages from I another Ubuntu repo, I can find out if it's necessary.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-17T11:04:44-07:00
by magick
Can you try ImageMagick-6.8.1-9, the current release? We tried your command under CentOS and could not reproduce it. You can also try the --disable-openmp configure script command line option, then build and install. That will turn threading off. Let us know which, if any, of these solutions resolve the problem.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-17T16:24:41-07:00
by lobster_johnson
According to the changelog, there are no pertinent fixes (related to threading, locking, SMP) between our version and the newest.

As I said, the image seems to be unimportant. The locking issue happens randomly, which suggests that it's a systematic race condition, not image-related.

Will try recompiling with --disable-openmp, although I have found discussions that imply that "MAGICK_THREAD_LIMIT=1" is effectively the same thing. As you can see, the stack trace is stuck in libpthread, not OpenMP.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-17T17:39:04-07:00
by magick
Replace ThrowFatalException() in magick/semaphore.c/DestroySemaphoreInfo() with perror() followed by _exit(1); That should prevent the lock but it does not resolve why the mutex cannot be destroyed.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-17T17:43:21-07:00
by lobster_johnson
Could do, although I would rather find who's locking the mutex. Any way to list known mutexes and locks, similar to the ipcs command for shared memory and semaphores?

Also: I haven't read the code, but from the stack trace it looks like it's quitting because of an exception. Is this the case?

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-17T19:45:35-07:00
by magick
Right, its in the termination code and it destroys the messaging system, then tries to call it when it can't destroy the semaphore mutex. We've been using this same code for about 10 years now and have not seen this particular problem. The perror() prevents the lock because it does not call the ImageMagick messaging system but as we said, it does not explain why the mutex returns an exception. The perror() should print the errno value. When you get it, post it here. Perhaps it will give us some insight into the problem.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-18T07:19:34-07:00
by lobster_johnson
We've been using this same code for about 10 years now
To be fair, though, operating systems and external library dependencies change. We have had ridiculous amounts of trouble with ImageMagick crashing, hanging and allocating huge amounts of memory on our multicore boxes, which is why we eventually had to add MAGICK_THREAD_LIMIT and MAGICK_TIME_LIMIT. (We also tried setting the memory limit, but it turns out ImageMagick starts to behave unpredictably then; it doesn't just gracefully use the disk if the memory limit is low.) And this is just the latest problem of many. You may not have encountered any problems does not preclude the existence of problems, but we certainly have.
The perror() should print the errno value. When you get it, post it here. Perhaps it will give us some insight into the problem.
Here is one that I believe is related, but I can't be sure:

Code: Select all

convert.im6: time limit exceeded `No such file or directory' @ fatal/cache.c/GetImagePixelCache/2044.
Most of our logs are actually simply empty, ie. no stderr output. The reason may be that hanging processes must be killed with SIGKILL, and perhaps output is buffered, I don't know.

Once I have a hanging process (it happens pretty much daily), is there anything useful I can do to inspect it? Anything with gdb to find the error, for example?

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-18T07:40:44-07:00
by magick
We investigating. We've been running http://www.imagemagick.org/MagickStudio ... Studio.cgi for over 10 years now, multithreaded, and have not encountered any of the problems you mention. This site processes thousands of images each day. To prevent accidental DOS, we first ping the image (ping does not allocate pixel buffers) to ensure its a reasonable size (less than 4096x4096) before we than read the image (which pixel buffers are allocated). We also set these limits to prevent any one session from consuming the server:
  • our $DiskLimit = "1.5GB"; # disk limit
    our $ExpireCache = '+1h'; # when to expire browser cache
    our $ExpireThreshold = 8*3600; # work files time to live
    our $LoadAverageThreshold = 10.0; # suspend service if load average exceeds threshold
    our $MapLimit = "512MB"; # map limit
    our $MaxFilesize = 4096; # max file size in kilobytes
    our $MaxImageArea = 16384; # max image area in kilobytes (width*height)
    our $MaxImageExtent = 16384; # max image extent in kilobytes (width*height*frames)
    our $MaxWorkFiles = 8192; # max number of work files
    our $MemoryLimit = "256MB"; # memory limit
    our $ThreadLimit = "2"; # thread limit
    our $MinExpireAge = 7200; # minimum expire age
    our $TimeLimit = 120; # time limit to stop runaway jobs
    our $Timeout = 120; # timeout value for uploading an image
It is of course quite difficult to debug this problem remotely. You can use gdb to attach to a running process and if it fails, type bt to get the stack trace.

We still recommend you install ImageMagick 6.8.1-10. There was some work done on the pixel cache that may resolve a couple of your problems.

ImageMagick 7 includes a distributed pixel cache to offload pixel resources to a remote server. However, version 7 is alpha and the network latency slows down processing considerably.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-18T08:16:58-07:00
by lobster_johnson
Does MagickStudio use any of the libmagick* libraries directly, or does it invoke the command line tools such as identify and convert?

We also ping the image (identify -ping) to determine the original format, size, colour space, etc. before processing. However, we don't have a limit on image sizes. Also, it's always the convert command that hangs, never the identify command.

As I said, we tried setting MAGICK_MEMORY_LIMIT to a low number + MAGICK_DISK_LIMIT to a large limit, in order to make ImageMagick spool large data buffers to disk. However, it just caused problems. (It's been a while, so I don't remember the details. This was back in 6.4 or 6.5, I think.)

We are testing 6.8.1-10 now.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-18T08:22:21-07:00
by magick
Our web interface uses PerlMagick. But the command-line programs (e.g. convert) and the scripting lanuages all call the same API, MagickCore. It should produce identical results.

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-18T08:26:17-07:00
by lobster_johnson
Does the API go through the same exception-handling and exiting path as exemplified by my stack trace? Does the API dump errors to stderr?

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-18T08:34:21-07:00
by magick
Yes. If a method fails, the PerlMagick method returns it as a status message. The web interface picks up the status and displays it on an error page. We have not seen many of these over the years (just a handful).

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-18T08:49:01-07:00
by lobster_johnson
Well, our system is currently processing about 8000 images per hour, and we have ImageMagick processes hanging pretty much every day. What is different from your system may be that we are forking "convert" for each image.

I'm going to wait for a hanging process and see if I can get some more information out of it.

(I have added MAGICK_MEMORY_LIMIT, MAGICK_MAP_LIMIT and MAGICK_DISK_LIMIT now, just in case the problems with those settings have been resolved.)

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-22T06:37:56-07:00
by lobster_johnson
I have packaged everything I could think of in a tarball for you. The tarball includes a gdb coredump (usable with "gdb -c") as well as separate dumps of each memory-mapped process region.

If you can give me your email, I can send you a link. I'd rather not divulge the data in public in case it (accidentally) contains anything confidential.

(This is still with 6.8.0. We are in the process of getting 6.8.1-9 into production, just running some automated tests.)

Re: Infinite mutex lock wait in "convert"

Posted: 2013-01-22T06:41:19-07:00
by magick
Send it to xxxxxxx@imagemagick.org. Substitute 'patches' for xxxxxxx.