High Availability With Asterisk

Home » Asterisk Users » High Availability With Asterisk

March 6, 2014 Thorolf Godawa Asterisk Users 12 Comments

Hi everybody,

what are the current options to get an Asterisk-system high available?

Using two servers as active/passive with DRBD, Pacemaker/Corosync works very good, there are no quality issues of the voice quality, even not on high loaded servers and no problems with a lot of small packages.

But for this you need two systems for every Asterisk-system, what is not
“economic” in any way.

Using (para-)virtualization with Xen could be an other option, on systems with low load this works reliable, but what happens on systems with high load? Are there any issues known about problems with the realtime, packet loss etc. because it runs in a VM?

The idea would be having a HA-cluster of two servers with Xen, each of them runs one instance of an Asterisk-system in a single VM and on a failure the VM will be restarted on the other node.

This might result in a much higher load on this node, because is runs two VMs, but for a short period, until the other node comes back again, it might be tolerable.

Are there other options running two Asterisk-instances parallel on one system, each binded on it’s own IP, maybe s.th. with chroot or similar?

Thanks a lot,

12 thoughts on - High Availability With Asterisk

Michelle Dupuis says:

March 6, 2014 at 9:47 am

Some food for thought:

If you use DRBD, then you will mirror corruption from one system to another. You also cannot selectively pick files in a folder to mirror (you will mirror a lot!) As well, DRBD struggles as peers are set further apart (latency) or number of changes increases.

A lot of HA tools don’t look deeper into Asterisk to see if/how it has failed (they only detected catastrophic failures). What happens when the Asterisk process is alive but no longer bridging calls?

If asterisk/host processes mess up an consume huge amounts of system resources, most HA tools cannot respond.

As a biased recommendation, take a look at HAAst at http://www.generationd.com It takes care of moving a shared IP between hosts as well as other features.

Michelle

(I work for Generationd 🙂
Mitul Limbani says:

March 6, 2014 at 9:58 am

Hello,

Using Single Server with multiple VMs essentially kills the purpose, coz it doesnt protect against physical hardware failures.

To save costs, use low end box as failover, to keep u in business, till primary box goes live.

Mitul
Chris Bagnall says:

March 6, 2014 at 12:56 pm

In fairness, the tools the OP mentioned (pacemaker/corosync) can be set up to detect other failures than whether asterisk is alive – a simple one to set up is to try connecting on 5060 UDP and make sure you get an acknowledgement. Likewise, you could even set up a call using the manager interface to a dummy extension and make sure it completes successfully.

FWIW, we tend to use pacemaker with heartbeat rather than corosync, but both perform a pretty similar function.

Kind regards,

Chris
Markus says:

March 6, 2014 at 2:33 pm

Hi Thorolf,

Am 06.03.2014 16:21, schrieb Thorolf Godawa:

hmm, all my Asterisk’es run in (KVM) VMs, no issues there. But how is this related to high availability? I think it’s not. 🙂

I think the way to go for high availability (and scalability) is Kamailio! In a redundant setup, running on 2 separate physical machines
(maybe in a VM, doesn’t matter). Then you make them failsafe using whatever tool(s) available. Then you can set up 1, 2, 10 or 100 Asterisk
“behind” Kamailio and any of them could fail (but 1 🙂 ) and you will still be online.

If you want to further develop the high availability thought, then you could use CephFS which will give you self-healing, 100% available storage over multiple physical storage servers. There you could store your Asterisk config files, or your MySQL database used by all the Asterisk servers, for CDRs, SIP registrations etc. It’s kinda slow, but I think fast enough for Asterisk / MySQL. 🙂

And, to scale and to make the Asterisk nodes redundant (redundancy is not really needed anymore, since Kamailio takes care of that, but basically then you get also VM/physical redundancy), you could look into OpenNebula which provides a nice auto-scaling feature already out of the box. If there’s load on your Asterisk VMs, OpenNebula will detect this and spawn new Asterisk VMs (probably on different physical servers, otherwise it doesn’t make that much sense performance-wise) which will automagically receive requests/calls from Kamailio. If the load goes down, the VM can be automagically stopped again to free resources for other VMs/applications. OpenNebula is less popular than OpenStack, which seems to be the first choice for Cloud-stuff today, but what I liked about OpenNebula is that it provides the auto-scaling feature already in the customer-facing web-frontend out-of-the-box, unlike OpenStack. So you could offer your customers a self-managed, redundant Asterisk cloud or something like that. 🙂

In theory, this combination should give you a 100% redundant, auto-healing, auto-scaling VoIP setup. 🙂

Regards Markus
Paul Belanger says:

March 7, 2014 at 10:30 am

Correct, in this case para-virt is not the way to go. You’ll want to use a virtualization platform that does support multi-hardware with live migration support.
Paul Belanger says:

March 7, 2014 at 10:31 am

+1 to this post. A lot of good information here.
Adolphe Cher-Aime says:

March 7, 2014 at 10:33 am

Good post. Actually this is the architecture we have.

On Fri, Mar 7, 2014 at 11:31 AM, Paul Belanger
Johann Steinwendtner says:

March 7, 2014 at 10:53 am

Sorry, for the stupid question, but what happens if Kamailio fails ?

Thanks.

regards

Hans
Gareth Blades says:

March 7, 2014 at 11:47 am

We have two copies on different servers which make use of keepalived to provide a virtual IP address between them. We also have them connected to two databases with active-active replication.
says:

March 8, 2014 at 2:28 pm

My approach (in theory only, so please correct me if I’m wrong) would be to run asterisk on multiple boxes (one each). A dedicated monitoring box (nagios? custom scripts?) would perform frequent checks against the boxes (one of my previous projects one asterisk was using call files to demonstrate its health to another one).

If a box fails, I would simply redirect/reroute its traffic to another one, using network solutions. Such as shutting down the production interface of a suspectedly failed asterisk box, having an idle one pick up its IP address, or using load balancing / routing / NAT to redirect the client’s traffic to a standby box.

My approach is based on the experience that linux based HA tools are often not free, or don’t scale well, or engineered to circumvent an error in a slower manner (eg. booting a second VM takes too much time).
However in the network world, there are well known protocols that were designed to take over in a matter of miliseconds.

I do understand that this would not provide ‘session’ data, so failing over to a different box would mean the need to re-register, could cause calls to drop etc. This might be unacceptable for you. As I said in the beginning, I haven’t been building such systems, in my experience a dropped call is not that big of a deal, if it happens because the network cuts over to a different box. This could be handled with a pair of frontend load balancers, where the number of asterisk boxes can be transparent.

hope this helps adam
Brynjolfur Thorvardsson says:

March 9, 2014 at 4:24 am

Hi all

Thanks for an interesting discussion.

I’ve looked at various options for load balancing Asterisk servers and providing fail over support.

One thing is not clear to me is: What happens to queues in a load-balancing environment? On our server, we have various queues with up to 20 incoming calls waiting in each, with typically 1-5 queue members. If incoming calls get placed randomly (or according to some heuristic) on different servers, is there any way that Asterisk can handle queue functionality?

Our client sip phones can enter or leave queues as they wish, but each sip phone is only registered on one server at a time – so queue members could be registered at different servers in a load balancing environment. Same goes for incoming calls, going to different servers but eventually ending up in the same queue.

I’m not sure if queues would ever work in a load balancing scenario, and I
haven’t found any information on the net to tell me otherwise. Does anybody have any experience/knowledge of if and how it could work?

Best regards

Binni

—–Original Message—
Hans Witvliet says:

March 9, 2014 at 3:16 pm

==============================================================
Hi Adam,

Don’t confuse “high availability” with “load balancing”, as these two are not related. These two have totally different objectives and are achieved in different ways. Either/both of them can very well be achieved with opensource tools.

Even with commercial software is maintaining call when a intermediate PABX breaks down nearly impossible