Last modified on 11 April 2013, at 23:47

Linux Applications Debugging Techniques/Deadlocks

AnalysisEdit

Searching for a deadlock means reconstructing the graph of dependencies between threads and resources (mutexes, semaphores, condition variables, etc.) - who owns what and who wants to acquire what. A typical deadlock would look like a loop in that graph. The task is tedious, as some of the parameters we are looking for have been optimized by the compiler into registers.

Below is an analysis of an x86_64 deadlock. On this platform, register r8 is the one containing the first argument: the address of the mutex:

(gdb) thread apply all bt
...
Thread 4 (Thread 0x419bc940 (LWP 12275)):
#0  0x0000003684c0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003684c08e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003684c08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400a50 in thread1 (threadid=0x1) at deadlock.c:66
#4  0x0000003684c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036840d3d1d in clone () from /lib64/libc.so.6
 
Thread 3 (Thread 0x421bd940 (LWP 12276)):
#0  0x0000003684c0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003684c08e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003684c08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400c07 in thread2 (threadid=0x2) at deadlock.c:111
#4  0x0000003684c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036840d3d1d in clone () from /lib64/libc.so.6
...
 
 
(gdb) thread 4
[Switching to thread 4 (Thread 0x419bc940 (LWP 12275))]#2  0x0000003684c08cdc in pthread_mutex_lock ()
   from /lib64/libpthread.so.0
 
(gdb) frame 2
#2  0x0000003684c08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
 
(gdb) info reg
...
r8             0x6015a0	6296992
...
 
(gdb) p *(pthread_mutex_t*)0x6015a0
$3 = {
  __data = {
    __lock = 2, 
    __count = 0, 
    __owner = 12276,   <== LWP 12276 is Thread 3
    __nusers = 1, 
    __kind = 0,        <== non-recursive
    __spins = 0, 
    __list = {
      __prev = 0x0, 
      __next = 0x0
    }
  }, 
  __size =     "\002\000\000\000\000\000\000\000\364/\000\000\001", '\000' <repeats 26 times>, 
  __align = 2
}
 
(gdb) thread 3
[Switching to thread 3 (Thread 0x421bd940 (LWP 12276))]#0  0x0000003684c0d4c4 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
 
(gdb) bt
#0  0x0000003684c0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003684c08e1a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003684c08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400c07 in thread2 (threadid=0x2) at deadlock.c:111
#4  0x0000003684c0673d in start_thread () from /lib64/libpthread.so.0
#5  0x00000036840d3d1d in clone () from /lib64/libc.so.6
#2  0x0000003684c08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
 
(gdb) info reg
...
r8             0x6015e0	6297056
...
 
(gdb) p *(pthread_mutex_t*)0x6015e0
$4 = {
  __data = {
    __lock = 2, 
    __count = 0, 
    __owner = 12275,  <=== Thread 4
    __nusers = 1, 
    __kind = 0, 
    __spins = 0, 
    __list = {
      __prev = 0x0, 
      __next = 0x0
    }
  }, 
  __size =     "\002\000\000\000\000\000\000\000\363/\000\000\001", '\000' <repeats 26 times>, 
  __align = 2
}

Threads 3 and 4 are deadlocking over two mutexes.


Note: If gdb is unable to find the symbol pthread_mutex_t because it has not loaded the symbol table for pthreadtypes.h, you can still print the individual members of the struct as follows:

(gdb) print *((int*)(0x6015e0))
$4 = 2
(gdb) print *((int*)(0x6015e0)+1)
$5 = 0
(gdb) print *((int*)(0x6015e0)+2)
$6 = 12275


AutomationEdit

An interposition library can be built to automate deadlock analysis. A significant number of APIs have to be interposed and even then there are cases that would go unnoticed by the library. For instance, one creative way to deadlock two threads without involving any userland locking mechanism is to have each thread join the other. Thus, interposition tools have limited diagnostic functionality.

There are a number of other tools available:

  • Userspace lockdep. Incipient work.
  • Locksmith. Basic.
  • Valgrind Helgrind. Does not suffer from the terrible slowdown other Valgrind tools have (in particular the memory analysis ones) but does not survives long on an amd64 platform.


pthreadsEdit

pthreads has some built-in support for certain synchronization mechanisms (e.g. PTHREAD_MUTEX_ERRORCHECK mutexes).

Also, there's a POSIX mutex construction attribute for robust mutexes (a recoverable mutex that was held by a thread whose process has died): an error return code indicates the earlier owner died; and lock acquisition implies acquired responsibility for dealing with any cleanup.


lockdepEdit

On lockdep-enabled kernels, pressing Alt+SysRq+d will dump information about all the locks known to the kernel.