Strange And Complete Failure Of Asterisk 1.8

Home » Asterisk Users » Strange And Complete Failure Of Asterisk 1.8
Asterisk Users 4 Comments

Hi all

We’ve had a very strange failure on an Asterisk 1.8 install that has been running for about a year at a customer site.

The physical hardware is fine, all other services off the CentOS 6.5 server are running. Only Asterisk is not working…

The first symptom was that no calls can be made over the SIP phones used with it, and no calls could be received over the SIP trunk connected to it.

I checked and noted that

sip show peers

in the CLI would either do nothing (e. g. just show asterisk*cli> again, with no response) or it would return only this:

asterisk*CLI> sip show peers Name/username Host Dyn Forcerport ACL Port Status asterisk*CLI>

A module show like sip also does literally nothing, just

asterisk*cli> module show like sip asterisk*CLI>

Soon after this, I lost the capacity to get any response if I do an asterisk
-r on the commandline – it would just hang indefintely.

Did a reboot, and then, I couldnt start asterisk at all – entering

# asterisk

would also just hang.

So, I recompiled asterisk from source and reinstalled the executable and all the module files. Still the same.

I happened to have an older asterisk executable from a few months before laying around and sha256summed it – and there was a difference in the checksum vs. the non-working asterisk binary – BUT it turned out that the newly recompiled asterisk binary has the SAME SHA256 checksum as the non-working asterisk binary.

System seems fine otherwise, nothing relevant in /var/log/messages or dmesg indicating a hardware failure. /var/log/asterisk/messages also contains no strange warnings or errors.

Anybody got any idea why I cannot resuscitate my Asterisk install, even after recompiling it from scratch from source? Why would asterisk die like this in the first place?

Thanks

4 thoughts on - Strange And Complete Failure Of Asterisk 1.8

  • DNS failure could do this

    Asterisk used to get stuck in a symmetric DNS request wait state which meant everything ground to a halt as it waited for a reply while DNS timed out.

    The recommended option was either ip only or a DNS proxy that failed fast this letting asterisk continue

    Cheers Duncan

  • definitely DNS… check your Register lines…

    Markus

    Am 27.05.2015 um 20:14 schrieb Duncan Turnbull:

  • Well,

    I had exactly the same issue as you described.

    It turned out to be a piece of malicious software that was running on the server.

    The customer server was compromised due to a weak root password and only Asterisk process was the target of the malicious program that was embedded deep into the server.

    The exact details escape me, but I do remember that it took more than two days of tracing and conducting security forensics to locate the exact cause of asterisk totally failing (I remember doing some GDB and kernel level syscall tracing with the kernel symbols installed, it was an educational adventure…)

    The problem was that even when I did a recompile and fresh installation, the malicious software would still target the new asterisk executable.

    The attacking software was complicated, hard to detect and almost impossible to remove.

    When I realized that the server was deeply compromised, I reinstalled CentOS from scratch on the same hardware, hardened the root password and that was the end of this issue.

    I hope this might save you some frustration.

    Take care, Antoine Megalla

    Date: Wed, 27 May 2015 13:55:22 +0200
    From: “Stefan Viljoen”
    To:
    Subject: [asterisk-users] Strange and complete failure of Asterisk 1.8
    Message-ID: <006101d09874$030f7f80$092e7e80$@verishare.co.za>
    Content-Type: text/plain; charset=”us-ascii”

    Hi all

    We’ve had a very strange failure on an Asterisk 1.8 install that has been running for about a year at a customer site.

    The physical hardware is fine, all other services off the CentOS 6.5 server are running. Only Asterisk is not working…

    The first symptom was that no calls can be made over the SIP phones used with it, and no calls could be received over the SIP trunk connected to it.

    I checked and noted that

    sip show peers

    in the CLI would either do nothing (e. g. just show asterisk*cli> again, with no response) or it would return only this:

    asterisk*CLI> sip show peers Name/username Host Dyn Forcerport ACL Port Status asterisk*CLI>

    A module show like sip also does literally nothing, just

    asterisk*cli> module show like sip asterisk*CLI>

    Soon after this, I lost the capacity to get any response if I do an asterisk
    -r on the commandline – it would just hang indefintely.

    Did a reboot, and then, I couldnt start asterisk at all – entering

    # asterisk

    would also just hang.

    So, I recompiled asterisk from source and reinstalled the executable and all the module files. Still the same.

    I happened to have an older asterisk executable from a few months before laying around and sha256summed it – and there was a difference in the checksum vs. the non-working asterisk binary – BUT it turned out that the newly recompiled asterisk binary has the SAME SHA256 checksum as the non-working asterisk binary.

    System seems fine otherwise, nothing relevant in /var/log/messages or dmesg indicating a hardware failure. /var/log/asterisk/messages also contains no strange warnings or errors.

    Anybody got any idea why I cannot resuscitate my Asterisk install, even after recompiling it from scratch from source? Why would asterisk die like this in the first place?

    Thanks

  • Re: Strange and complete failure of Asterisk 1.8 (Duncan Turnbull)
    Re: Strange and complete failure of Asterisk 1.8 (Markus Weiler)

    Thanks Marcus & Duncan

    Pulled the machine and replaced it with a brand new one. Same network and same DNS server active there.

    New system is running the same Asterisk (1.8.11.0) and the same CentOS, same Asterisk config files – still running beautifully since yesterday.

    Got the sick system at our lab now and pulling it apart – even isolated from the LAN (e. g. with no access at all to a DNS server) it still is exhibiting the same problem.

    Connecting it to the LAN with our HeadOffice DNS server present (which also serves our “live” HO asterisk) it still exhibits the problem.

    But thanks anyway, if it happens again well just pull the live system and replace from the ready-reserve we maintain of already configured Ast boxes.

    I’ll keep digging awhile with the sick system and see if I can come up with something, will focus on DNS – thanks for the pointers!

    Kind regards

    Stefan