PJSIP Lockup

Home » Asterisk Users » PJSIP Lockup
Asterisk Users 7 Comments

Hello All, I’m using Asterisk 16.8.0 on a CentOS 7 box. Previously 16.5.0, But recently upgraded to attempt to resolve this issue. Using bundled PJSIP. The PBX is using mysql realtime for most functions. The Mysql server is on the same lan as the asterisk box.

As more users have been moved to this box. It’s become unstable. Randomly, I’ll start seeing “WARNING[12667] taskprocessor.c: The
‘pjsip/distributor-00000173’ task processor queue reached 500 scheduled tasks.”

At that time, Running “pjsip show contacts” and “pjsip show endpoints”
returns nothing. And the box stops responding to all SIP.

The only way I’ve found thus far to resolve the issue is a “service asterisk restart”.

I can confirm at the time of the issue running “asterisk -x ‘core show taskprocessors’ | grep ‘distributor'” does show many items pending across all queues. And the number just increases. Normally when all is fine. They’re all at 0.

Google-foo hasn’t produced anything for me outside issues from 13.x that claim to be resolved. Since asterisk isn’t fully crashing, I don’t think I
can get backtrace. Someone please correct me if I’m wrong. Any ideas? Tips?

*Nick Olsen*
Network Engineer Office: 321-408-5000 x103
Mobile: 321-794-0763

7 thoughts on - PJSIP Lockup

  • Thanks for the info, Joshua.

    Does PJSIP handle database access the same way Chan_sip did? We had a number of boxes running chan_sip referencing the same mysql server without issue.

    We’re going to attempt to get a backtrace on the next occurance. We’re also going to run a local copy of the database on the same physical asterisk instance and have the system reference it. Just to “throw everything at the wall”.

    *Nick Olsen*
    Network Engineer Office: 321-408-5000 x103
    Mobile: 321-794-0763

  • It uses the same underlying API and layer. It can do more frequent database access though due to queries and because PJSIP is multithreaded.

  • We ultimately found this to be a voicemail issue. The voicemail is held in MYSQL as well (via ODBC). And we found when attempting to playback a customers voicemail unavail greeting is when the deadlock would occur
    (Immediately, every time. Throwing the same “task processors” errors, And making pjsip completely unresponsive). We had imported a number of greetings from a legacy asterisk system and the vast majority of them worked. When we deleted the row containing the customers unavail greeting
    (making asterisk revert to read the mailbox number) all issues went away. If we re-record the customers unavail greeting it works fine and the problem doesn’t reoccur. This was one out of ~250 voicemails imported.

    Since then we’ve done a few more migrations and they’ve all gone smooth with the exception of the most recent one. ~50% of the imported greetings have caused asterisk to deadlock. We’ve been checking them now at time of migration.

    What I can’t figure out is what it doesn’t like about the greeting. It was on a previous asterisk system working fine. The row looks identical to a working one. The only thing I can guess is something about the blob for the recording goes wrong. It would be nice if asterisk handled that more gracefully.

    I post this mostly just for internet history. To hopefully help the next guy out who has this same issue.

    *Nick Olsen*
    Network Engineer Office: 321-408-5000 x103
    Mobile: 321-794-0763

  • Hi All

    This sounds just like a problem I have had and still investigating having moved to 16.9 using chan_sip. I am still trying to repeat the problem it looks from debug that the issue is either voicemail of call transfer but I
    cant consistently repeat it.

    Voicemail is using ODBC and I just imported the data from the old system into the new database.

    Nick – if you have any more info I would be grateful

    TIA

    Paddy

    _____

    From: asterisk-users [mailto:asterisk-users-bounces@lists.digium.com] Thanks for the info, Joshua.

    Does PJSIP handle database access the same way Chan_sip did? We had a number of boxes running chan_sip referencing the same mysql server without issue.

    We’re going to attempt to get a backtrace on the next occurance. We’re also going to run a local copy of the database on the same physical asterisk instance and have the system reference it. Just to “throw everything at the wall”.

    It uses the same underlying API and layer. It can do more frequent database access though due to queries and because PJSIP is multithreaded.

  • Paddy, It’s pretty easy to spot from the CLI.

    A voicemail gets called. And the screen basically stops scrolling from there. Eventually you’ll get the “Task processors exceeded 500 queued tasks” or something like that. And maybe channels attempting to hangup due to lack of RTP (If you have no-rtp timers configured).

  • –00000000000017154a05a2a1de7c Content-Type: text/plain; charset=”UTF-8″

    Given that the issue appears to be related to specific rows and not the database in general, you might want to get a backtrace while the system is locked as Josh suggested earlier. Once you get the backtraces, open an issue ar https://issues.asterisk.org.

    https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace NOTE: you do NOT need to recompile with the DEBUG_THREADS, MALLOC_DEBUG, DONT_OPTIMIZE or BETTER_BACKTRACES but the Asterisk binaries need to still have the symbols in them (un-stripped).