Linux Applications Debugging Techniques/Core files

A core dump is a snapshot of the memory of the program, processor registers including program counter and stack pointer and other OS and memory management information, taken at a certain point in time. As such, they are invaluable for capturing the state of rare occurring races and abnormal conditions.

What is more, such rarities will be found usually on heavily used production or QA machines where gdb is not available, nor is access easy to the machine. Worse, the heaviest users are usually the biggest clients (moneywise...). As such, it is important to get as much forensic data as available, and plan for it.

One can force a core dump from within the program or from outside at chosen moments. What a core cannot tell is how the application ended up in that state: the core is no replacement for a good log. Verbose logs and core files go hand in glove.

Prerequisites

For a process to be able to dump core, a few prerequisites have to be met:

the set core size limit should permit it (see the man page for ulimit). E.g.: ulimit -c unlimited. It can also be set from within the program.
the process to dump core should have write permissions to the folder where the core is to be dumped to (usually the current working directory of the process)

Where is my core?

Usually the core is dumped in the current working directory of the process. But the OS can be configured otherwise:

# cat /proc/sys/kernel/core_pattern
%h-%e-%p.core

# sysctl -w "kernel.core_pattern=/var/cores/%h-%e-%p.core"

Dumping core from outside the program

One possibility is with gdb, if available. This will let the program running:

(gdb) attach <pid>
(gdb) generate-core-file <optional-filename>
(gdb) detach

Another possibility is to signal the process. This will terminate it, assuming the signal is not caught by a custom signal handler:

kill -s SIGABRT <pid>

Dumping core from within the program

Again, there are two possibilities: dump core and terminate the program or dump and continue:

void dump_core_and_terminate(void)
{
    /*
     * Alternative:
     *   char *p = NULL; *p = 0;
     */
    abort();
}

void dump_core_and_continue(void)
{
    pid_t child = fork();
    if (child < 0) {
        /*Parent: error*/
    }
    else if (child == 0) {
        dump_core_and_terminate(); /*Child*/
    }
    else {
        /*Parent: continue*/
    }
}

Note: use dump_core_and_continue() with care: in a multi-threaded program, the forked child will have only a clone of the parent thread that called fork() [Butenhof Ch5; re: threads & fork]. This has number of implications, in particular with respect to mutexes, but the particular point here is that the core that the child will dump will contain information only for one thread. If you need to dump a core with all threads without aborting the process, try to use the google core dumper library, even if it has not been maintained for years.

Shared libraries

To obtain a good call stack, it is important that the gdb loads the same libraries that were loaded by the program that generated the core dump. If the machine we are analyzing the core has different libraries (or has them in different places) from the machine the core was dumped, then copy over the libraries to the analyzing machine, in a way that mirrors the dump machine. For instance:

$ tree .
.
|-- juggler-29964.core
|-- lib64
|   |-- ld-linux-x86-64.so.2
|   |-- libc.so.6
|   |-- libm.so.6
|   |-- libpthread.so.0
|   `-- librt.so.1
...

At the gdb prompt:

(gdb) set solib-absolute-prefix ./
(gdb) set solib-search-path .
(gdb) file ../../../../../threadpool/bin.v2/libs/threadpool/example/juggler/gcc-4.1.2/debug/link-static/threading-multi/juggler
Reading symbols from /home/aurelian_melinte/threadpool/threadpool-0_2_5-src/threadpool/bin.v2/libs/threadpool/example/juggler/gcc-4.1.2/debug/link-static/threading-multi/juggler...done.
(gdb) core-file juggler-29964.core 
Reading symbols from ./lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for ./lib64/librt.so.1
Reading symbols from ./lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for ./lib64/libm.so.6
Reading symbols from ./lib64/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for ./lib64/libpthread.so.0
Reading symbols from ./lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for ./lib64/libc.so.6
Reading symbols from ./lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for ./lib64/ld-linux-x86-64.so.2
Core was generated by `../../../../bin.v2/libs/threadpool/example/juggler/gcc-4.1.2/debug/link-static/'.
Program terminated with signal 6, Aborted.
#0  0x0000003684030265 in raise () from ./lib64/libc.so.6
(gdb) frame 2
#2  0x0000000000404ae1 in dump_core_and_terminate () at juggler.cpp:30

Source Code

To point the debugger to the source files:

(gdb) set substitute-path /from/path1 /to/path1
(gdb) set substitute-path /from/path2 /to/path2

analyze-cores

Here is a script that will generate a basic report per core file. Useful the days when cores are raining on you:

#!/bin/bash

#
# A script to extract core-file informations
#


if [ $# -ne 1 ]
then
  echo "Usage: `basename $0` <for-binary-image>"
  exit -1
else
  binimg=$1
fi


# Today and yesterdays cores
cores=`find . -name '*.core' -mtime -1`

#cores=`find . -name '*.core'`


for core in $cores 
do 
  gdblogfile="$core-gdb.log"
  rm $gdblogfile
  
  bininfo=`ls -l $binimg`
  coreinfo=`ls -l $core`
  
  gdb -batch \
      -ex "set logging file $gdblogfile" \
      -ex "set logging on" \
      -ex "set pagination off" \
      -ex "printf \"**\n** Process info for $binimg - $core \n** Generated `date`\n\"" \
      -ex "printf \"**\n** $bininfo \n** $coreinfo\n**\n\"" \
      -ex "file $binimg" \
      -ex "core-file $core" \
      -ex "bt" \
      -ex "info proc" \
      -ex "printf \"*\n* Libraries \n*\n\"" \
      -ex "info sharedlib" \
      -ex "printf \"*\n* Memory map \n*\n\"" \
      -ex "info target" \
      -ex "printf \"*\n* Registers \n*\n\"" \
      -ex "info registers" \
      -ex "printf \"*\n* Current instructions \n*\n\"" -ex "x/16i \$pc" \
      -ex "printf \"*\n* Threads (full) \n*\n\"" \
      -ex "info threads" \
      -ex "bt" \
      -ex "thread apply all bt full" \
      -ex "printf \"*\n* Threads (basic) \n*\n\"" \
      -ex "info threads" \
      -ex "thread apply all bt" \
      -ex "printf \"*\n* Done \n*\n\"" \
      -ex "quit" 
done

An alternative worth exploring is btparser.

Canned user-defined commands

Same reporting functionality can be canned for gdb:

define procinfo
    printf "**\n** Process Info: \n**\n"
    info proc
    
    printf "*\n* Libraries \n*\n"
    info sharedlib
    
    printf "*\n* Memory Map \n*\n"
    info target
    
    printf "*\n* Registers \n*\n"
    info registers
    
    printf "*\n* Current Instructions \n*\n" 
    x/16i $pc
    
    printf "*\n* Threads (basic) \n*\n"
    info threads
    thread apply all bt
end
document procinfo
Infos about the debugee. 
end

define analyze
    procinfo
    
    printf "*\n* Threads (full) \n*\n"
    info threads
    bt
    thread apply all bt full
end

analyze-pid

A script that will generate a basic report and a core file for a running process:

#!/bin/bash

#
# A script to generate a core and a status report for a running process. 
#


if [ $# -ne 1 ]
then
  echo "Usage: `basename $0` <PID>"
  exit -1
else
  pid=$1
fi


gdblogfile="analyze-$pid.log"
rm $gdblogfile

corefile="core-$pid.core"
  
gdb -batch \
      -ex "set logging file $gdblogfile" \
      -ex "set logging on" \
      -ex "set pagination off" \
      -ex "printf \"**\n** Process info for PID=$pid \n** Generated `date`\n\"" \
      -ex "printf \"**\n** Core: $corefile \n**\n\"" \
      -ex "attach $pid" \
      -ex "bt" \
      -ex "info proc" \
      -ex "printf \"*\n* Libraries \n*\n\"" \
      -ex "info sharedlib" \
      -ex "printf \"*\n* Memory map \n*\n\"" \
      -ex "info target" \
      -ex "printf \"*\n* Registers \n*\n\"" \
      -ex "info registers" \
      -ex "printf \"*\n* Current instructions \n*\n\"" -ex "x/16i \$pc" \
      -ex "printf \"*\n* Threads (full) \n*\n\"" \
      -ex "info threads" \
      -ex "bt" \
      -ex "thread apply all bt full" \
      -ex "printf \"*\n* Threads (basic) \n*\n\"" \
      -ex "info threads" \
      -ex "thread apply all bt" \
      -ex "printf \"*\n* Done \n*\n\"" \
      -ex "generate-core-file $corefile" \
      -ex "detach" \
      -ex "quit"

Thread Local Storage

TLS data is rather difficult to access with gdb in the core files, and __tls_get_addr() cannot be called.

References

__thread variables