Problems With The Voice Quality Under Load
B.H.
Hello, all 🙂
We have a cluster of Asterisk (v. 11.9) servers that host IVR applications. The servers work behind SIP proxy (kamailio) for load balancing.
All servers are in 2 processor configuration, 8-10 cores per CPU.
When a particular server gets about 500 concurrent calls, the sound quality begins to degrade, the sound plays slowly and with clicks. As far as i understand, it’s because asterisk is unable to send the voice stream in time i.e. the server is overloaded.
What i don’t understand is, at the time that the server appears to be overloaded and the audio quality is bad, actual server’s load is no more than 30-40% (60-70% idle CPU on average). IMHO, this indicates that for some reason the server is unable to use it’s CPU capacity efficiently. May be because of some kind of thread contention inside asterisk?
I have read blogs that advice to divide physical server into several VMs and they claim that this will improve the total capacity. In my own experience, this did not work very well and seems like the visualization actually made the quality worse.
Do you have any advice for me (other than purchasing more servers 😉 ?
Thanks!
4 thoughts on - Problems With The Voice Quality Under Load
Have you done the math for the network connections? BTF and external What bit rates for the sound?
What codecs?
How are calls coming in – SIP – analogue
Disks OK(low IO per second)? Caching working OK?
CPU may not be the problem if your CPU utilization is really that low.
Ron
Perhaps it’s not CPU that is blocking things, but I/O? It should be visually and audibly obvious if this is the case — the disk activity lights will be illuminated, and the HDDs will be making noises.
If you can spare the RAM, consider setting up a tmpfs to hold your IVR prompts in memory.
You may be using a feature (i.e., meetme) that is single threaded. If you view the system using ‘htop’ instead of ‘top’ (or, press ‘1’ while running top) you may see that a single CPU is maxed out while the others are relatively idle.
B.H.
Hi, really thanks for all the relies 🙂
Here’s my answers:
I did check the CPU load and it is distributed evenly between all the cores. Most of the load is in “system”, probably network handling in the kernel.
We do use ConfBridge a lot, but all conferences are on a single server and the issue is on all the servers.
What other features are single-threaded?
Asterisk is almost the only thing that runs on the PBX servers. There are also some PHP AGI scripts (5-10 per minute per server on average) and most of the AGI functions are off-loaded to a java application on the central server with fast AGI. But there is still a lot of dialplan logic that runs on the PBX’s. Maybe, huge dialplan is the key for the issue?