Asterisk Server Hangs After Bridging 2 Channels

Home » Asterisk Tips » Asterisk Server Hangs After Bridging 2 Channels
Asterisk Tips 7 Comments

If I use either the Bridge() app, or the manager Action: Bridge() in a certain scenario (Basically to bridge 2 SIP channels, like an attended transfer, resulting in 2 other SIP channels being discarded) then the whole server locks solid. The console stops, the network stops, something is hammering the box and nothing (including debug tools) seem to be able to do anything about it.

If I ‘nice’ asterisk to lowest priority, and ‘nice’ a copy of ‘top’ to highest priority, everything still locks. After a short period, the box recovers, seemingly due to the 60 second RTP timer. Anything that was being logged is lost.

My theory is the I am somehow causing a frame loop internal to Asterisk by setting up some type of illegal bridge, but I used the same code on 1.2 (I backported Bridge()) and it works just fine.

I need suggestions please on how to determine where it is locking, and why.

UPDATE:

I found that doing a build with debug symbols included and running under gdb slowed down asterisk enough for me to get debug output. Turns out that 1.6.2.11 fixes the symptom, and the cause is in my own “hack” to the code. Holding locks too long == verybad.

I was running in high priority mode – I thought I was turning it off for testing, but looks like I left the setting in asterisk.conf so leaving ‘-p’ off the command line was making no difference *sigh*

I believe the issue was actually with the devicestate thread. It was trying to update state on a locked channel, and was trying to grab the
lock so regularly that it caused asterisk to grab lots of CPU cycles (because of -p mode) The lock was not released because it was waiting
on a database write, which was being done by a lower priority external process that was getting no time scheduled to it.

The database write is a local hack to record some extra call data – I changed it to occur after the locks are released as I should have done
in the first place. 1.6.2.11 does not seem to have quite the same issue – It recovers after the usual 200 lock attempts and gets on with life much more happily. I cannot see any changes between 1.6.2.10 and 1.6.2.11 that would have improved this behavior.

So in this case, the canary actually saved me from needing to reboot the machine in order to recover from the lockup. The thread monitoring the canary noticed within 60 seconds that the canary stopped updating the file and deprioritized Asterisk, allowing the other processes to proceed.

Many thanks,
Steve

 

7 thoughts on - Asterisk Server Hangs After Bridging 2 Channels

  • Steve Davies schrieb:
    hello,

    have you allready tried strace ?
    you could just easily start asterisk with this command:

    strace asterisk -vvvvdddd

    or whatever options you want.
    maybe you could see some more information with this.
    you could also try

    time asterisk -vvvvvdddd

    to see if its user or system time you loose.
    and finaly using a second terminal and try rasterisk -x”core show locks”
    for example will also give you some information.

    best regards

    steve

  • Yes, I tried this. Output just stops along with everything else and
    there are no clues.

    Interesting. I’ll try that.

    The whole system is locked. I cannot run anything 🙁

    Cheers,
    Steve

  • Steve Davies schrieb:
    if you know in which function this happens you could also patch some
    ast_verbose rows into this function to see where it happens.

    another thing you could try is using gprof to see which functions waste
    much time. for this you need to compile asterisk like this:
    make ASTCFLAGS=”-pg” ASTLDFLAGS=”-pg”
    make install

    after this you will see a gmon.out file in the directory from where you
    have started asterisk (your home) and then you could use gprof with this
    gmon.out file.

    maybe you will find something with this.
    best regards

    steve

  • I found that doing a build with debug symbols included and running
    under gdb slowed down asterisk enough for me to get debug output.

    Thanks for the pointers. I’ll start a separate thread on working out
    what the hell is going on 🙂

    Cheers,
    Steve

  • Turns out that 1.6.2.11 fixes the symptom, and the cause is in my own
    “hack” to the code. Holding locks too long == verybad.

    Steve

  • Do you have high priority enabled in asterisk.conf?

    Are you using DAHDI? (maybe kernel not happy)

    I really thought that the canary should have sounded if Asterisk got in
    a loop – or maybe that only happens with high priority?

  • The canary only runs in high priority mode, and it’s only able to do anything
    if high priority scheduling is the culprit. If it’s something else, like
    memory swapping, there’s nothing the canary can do to fix that.