How Can I Check Backtrace Files ?

Home » Asterisk Users » How Can I Check Backtrace Files ?
Asterisk Users 4 Comments

Hello,

I carefully read [1] which details how backtrace files can be produced.

Maybe this seems natural to some, but how can I go one step futher, and check that produced XXX-thread1.txt, XXX-brief.txt, … files are OK ?

In other words, where can I find an example on how to use one of those files and check by myself, that if a system ever fails, I won’t have to wait for another failure to provide required data to support teams ?

Best regards

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

4 thoughts on - How Can I Check Backtrace Files ?

  • It’s a great question but I could spend a week answering it and not scratch the surface. 🙂 It’s not a straightforward thing unless you know the code in question. The most common is a segmentation fault (segfault or SEGV). In that case, the thread1.txt file is the place to start. Since most of the objects passed around are really pointers to objects, the most obvious cause would be a 0x0 for a value. So for instance “chan=0x0”. That would be a pointer to a channel object that was not set when it probably should have been. Unfortunately, it’s not only 0x0 that could cause a segv. Anytime a program tries to access memory it doesn’t own, that signal is raised. So let’s say there a 256 byte buffer which the process owns. If there’s a bug somewhere that causes the program to try and access bytes beyond the end of the buffer, you MAY get a segv if that process doesn’t also own that memory. If this case, the backtrace won’t show anything obvious because the pointers all look valid. There probably would be an index variable (i or ix, etc) that may be set to 257 but you’d have to know that the buffer was only 256 bytes to realize that that was the issue.

    Deadlocks are even harder to troubleshoot. For that, you need to look at full.txt to see where the threads are stuck and find the 1 thread that’s holding the lock that the others are stuck on.

    Sorry. I wish I had a better answer because it’d help a lot if folks could do more investigation themselves.

  • 2017-12-06 15:52 GMT+01:00 George Joseph :

    Thanks very much for trying, anyway 😉

    True ! I experienced segfaults lately and I could not configure the platform I used then (Debian Jessie) to produce core files in a directory Asterisk can write into. Now, with Debian Stretch, I can produce core file at will (with a kill -s SIGSEGV ). I checked ast_coredumped worked OK as it produced thread.txt files and so on.

    Ideally, I would like to go one step further: check now that a future .txt file would be “workable” (and not “you should have compiled with option XXX
    or configured with option YYY) .

    So, with an artificial kill -s SIGSEGV , does the bellow output prove I have a workable .txt files (having .txt files that let people find the root cause of the issue is another story as we probably can only hope for the best here) ?

    # head core-brief.txt
    !@!@!@! brief.txt !@!@!@!

    Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
    #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
    ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
    #1 0x000055cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
    “threadpool.c”, lineno31, func=0x55cdcb7d4ea8 <__PRETTY_FUNCTION__.8978>
    “worker_idle”, cond_name=0x55cdcb7d4b7f “&worker->cond”, mutex_name=0x55cdcb7d4b71 “&worker->lock”, cond=0x7f2abc000978, t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
    #2 0x000055cdcb75d153 in worker_idle (worker=0x7f2abc000970) at threadpool.c:1131
    #3 0x000055cdcb75ce61 in worker_start (arg=0x7f2abc000970) at threadpool.c:1022
    #4 0x000055cdcb769a8c in dummy_start (data=0x7f2abc000a80) at utils.c:1238
    #5 0x00007f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at pthread_create.c:333

  • The number one question when you supply a backtrace: Does it have symbols?

    So yes, the sample above is at least workable. It has symbols as it shows the function name, source file name, and line number in the backtrace. Without symbols nobody can look at the backtrace and see what is going on. It is just a bunch of numbers and question marks (??) with maybe a public function name.

    The second question: Is the backtrace from an unoptimized build?

    Optimized builds provide some performance improvement for normal operation. However, what the compiler does to the code can be difficult to figure out in a backtrace. The compiler can optimize out variables that could make understanding what is going on harder.

    So it depends upon what happened if an optimized backtrace can help find the root cause or not. It is up to you whether you want to run in production with an optimized build or not.

    I also recommend always compiling with BETTER_BACKTRACES enabled in menuselect. With that enabled then any backtraces put into log files by FRACKS and the lock output from the CLI command “core show locks” is understandable when symbols are available. You get backtraces similar to the backtrace sample above.

    Richard

  • That’s it! The key pieces of information are the function names
    (worker_idle, worker_start, etc.), the filename (threadpool.c, etc) and the line numbers (1131, 1022, etc).