ERROR During High Volume MoH Dialplan
Hello,
I’ve recently setup a small load test against an instance of Asterisks. I’ve tested on asterisk 13.5 and 14.6 with the same results.
I am using PJSIP. My dial plan is,
[test]
exten => 1001,1,Answer
exten => 1001,n,MusicOnHold(15)
exten => 1001,n,Hangup
I am using SIPP to test. I can share XML if desired but it simply waits on the line while music plays for 8 seconds. I used sippycup to generate it with the following steps in the yaml file.
steps:
– invite
– wait_for_answer
– ack_answer
– sleep 8
– send_bye
At around 500 calls per second I begin to see the following ERRORs,
[Aug 28 17:46:14] ERROR[26150][C-00005594]: frame.c:343 ast_frdup: Excessive refcount 100000 reached on ao2 object 0x26bffc0
[Aug 28 17:46:14] ERROR[26150][C-00005594]: frame.c:343 ast_frdup: FRACK!, Failed assertion Excessive refcount 100000 reached on ao2 object 0x26bffc0 (0)
Got 19 backtrace records
#0: [0x45d229] /usr/sbin/asterisk(__ao2_ref+0x1a9) [0x45d229]
#1: [0x526ce6] /usr/sbin/asterisk(ast_frdup+0x116) [0x526ce6]
#2: [0x5fa616] /usr/sbin/asterisk(ast_translate+0x306) [0x5fa616]
#3: [0x4bf16b] /usr/sbin/asterisk(ast_write+0x104b) [0x4bf16b]
#4: [0x7efeb578230b] /usr/lib/asterisk/modules/res_musiconhold.so(+0x430b) [0x7efeb578230b]
#5: [0x4b5b52] /usr/sbin/asterisk() [0x4b5b52]
#6: [0x4c259c] /usr/sbin/asterisk() [0x4c259c]
#7: [0x4c4a45] /usr/sbin/asterisk() [0x4c4a45]
#8: [0x7efeb578478d] /usr/lib/asterisk/modules/res_musiconhold.so(+0x678d) [0x7efeb578478d]
#9: [0x58ec79] /usr/sbin/asterisk(pbx_exec+0xb9) [0x58ec79]
#10: [0x582e84] /usr/sbin/asterisk() [0x582e84]
#11: [0x584e7c] /usr/sbin/asterisk() [0x584e7c]
#12: [0x5863fb] /usr/sbin/asterisk() [0x5863fb]
#13: [0x60002a] /usr/sbin/asterisk() [0x60002a]
I’ve also seen similar behavior when using playback instead of MusicOnHold. CPU usage gets around 50%. Can anyone enlighten me on the meaning and cause of the error? Is there some steps (config etc) that can be taken to alleviate the issue?
Thanks Joseph
16 thoughts on - ERROR During High Volume MoH Dialplan
I think you mean 13.15.0 as the excessive ref count trap is not in 13.5.0.
This inline backtrace would be more useful if you had BETTER_BACKTRACES
enabled.
This particular FRACK is meant to help find ao2 object reference leaks. It acts as an early warning for excessive references to any particular ao2
object used in the code. The FRACK itself is benign. Based upon the inline backtrace the ao2 object is likely to be a codec format.
* What codecs are you using in this setup?
* With 500 calls/sec and the calls lasting 8 seconds that comes to 4000
active channels. Hitting the FRACK would result in an average of 25
references to the format per channel. This is a simplistic calculation as there are going to be some references that have nothing to do with a call. The number of base references would depend upon which codec is involved.
* There is no user configurable option to change the excessive ref count trigger value. However, you could change the EXCESSIVE_REF_COUNT define value in the main/astobj2.c file and recompile.
Richard
Hi Richard,
Thank you for the reply
Correct, I did mean 13.15.
I set no optimize and better backtrace through “make menuselect” and the output is now
[Aug 28 21:41:16] ERROR[17171][C-0000392d]: frame.c:343 ast_frdup: FRACK!, Failed assertion Excessive refcount 100000 reached on ao2 object 0x21962b0 (0)
Got 26 backtrace records
#0: [0x61923f] main/utils.c:2475 __ast_assert_failed() (0x6191bb+84)
#1: [0x45ffc9] main/astobj2.c:543 __ao2_ref() (0x45fc3d+38C)
#2: [0x5320ce] main/frame.c:345 ast_frdup() (0x531e4c+282)
#3: [0x531a99] main/frame.c:196 ast_frisolate() (0x531a76+23)
#4: [0x60be51] main/translate.c:459 ast_trans_frameout() (0x60bd6e+E3)
#5: [0x60be75] main/translate.c:464 default_frameout()
#6: [0x60c46a] main/translate.c:579 ast_translate() (0x60c192+2D8)
#7: [0x4c0bf1] main/channel.c:5290 ast_write() (0x4bfb3e+10B3)
#8: [0x7fdef8345486] res/res_musiconhold.c:455 moh_files_generator()
#9: [0x4ba212] main/channel.c:3014 generator_force()
#10: [0x4bc23d] main/channel.c:3872 __ast_read()
#11: [0x4be29b] main/channel.c:4399 ast_read() (0x4be27e+1D)
#12: [0x4b6312] main/channel.c:1568 ast_safe_sleep_conditional() (0x4b6229+E9)
#13: [0x4b64c9] main/channel.c:1613 ast_safe_sleep() (0x4b64a1+28)
#14: [0x7fdef8346caa] res/res_musiconhold.c:834 play_moh_exec()
#15: [0x5970a3] main/pbx_app.c:491 pbx_exec() (0x596f87+11C)
#16: [0x582edf] main/pbx.c:2923 pbx_extension_helper()
#17: [0x586c30] main/pbx.c:4155 ast_spawn_extension() (0x586bcc+64)
#18: [0x5878bb] main/pbx.c:4328 __ast_pbx_run()
#19: [0x589061] main/pbx.c:4651 pbx_thread()
#20: [0x61624e] main/utils.c:1233 dummy_start()
* What codecs are you using in this setup?
In pjsip.conf I have disallow=all and allow=ulaw. If I can provide more information or a better response to this question please guide me on how to do that.
Thanks Joseph
menuselect => Compiler Flags => Better Backtraces
Doug
Please ignore the noise, I need to slow down when I read.
Doug
Is there any more information I can provide to give insight to these errors?
Any further advice on avoiding these during high call volume?
I was hoping Asterisk would handle more than 4k simultaneous calls.
Thanks
Joseph
* There is no user configurable option to change the excessive ref count trigger value. However, you could change the EXCESSIVE_REF_COUNT define value in the main/astobj2.c file and recompile.
Richard
I know from experience that Asterisk can handle more than 4k simultaneous calls, however it’s an extreme case to have all of them playing music on hold.
I think that if you tested 4k simultaneous calls with standard media streams on the majority of them, you would not experience the problem.
Is this a real problem for you – that Asterisk can’t manage 4k MoH sessions simultaneously, even though it can manage 4k standard phone calls?
Antony.
It is meant to simulate simultaneous calls on an IVR. I have also tested with a separate set of audio files closer to what the actual IVR menu. This produced the same result.
I apologize for not clearly stating the use case up front. I will try to give a bit more detail on that now.
I have an IVR menu and submenu that users may dial into. I initially tested with the IVR audio files. When I began experiencing this issue I used MoH as an attempt to narrow down the problem to the simplest dialplan possible.
If I continue my test at this volume or a higher volume, I begin to get errors about reaching the maximum queue size for that particular taskprocessor. Since, these error proceeded that I thought that they may be the key to preventing the queue from maxing out.
It sounds like Richard is saying that these refcount logs may not actually be errors and can be ignored in this scenario. If that is the case then is there anything that can be done about the task processor queue size? Is that simply a side effect of having so many callers listening to the IVR at the same time?
pjsip.conf is currently setup with a trunk allowing incoming calls from a specific IP. This is the task processor that is maxing out.
So I am looking for a better way to allow several thousand callers to listen to this IVR menu at the same time.
Thank you for the feedback thus far.
Any info and advice is helpful.
Thanks Joseph
An alternative that comes to mind is to have 1 conference with 1 channel playing MoH in it and then add callers in a muted state to it. Never tried this, don’t know if it fits your case.
I’m thinking multiple hosts.
I’m not a fan of 4,000 eggs in one basket.
+1 for horizontal scaling as the best solution in this situation.
Pete
Thanks for the feedback.
I do agree with having multiple smaller servers. When I was first approached with this task I mentioned as much. However, the current desire is to work with already existing hardware. That is out of my hands at the moment unless it just can’t be done. I will explore Freeswitch a bit soon to compare it as well.
I am struggling to find what the bottle neck is in this scenario. Does anyone have any advice on what that could be or on steps to discover it? Do you think that tasks are pooling up because of transcoding? If so would it help to change the codec that is being used? I am not sure about the MoH but the audio files I am using are gsm.
Thanks
Joseph
In article, Joseph Smith wrote:
You will find it less taxing on the server if you have MoH files and sounds files available in all the possible native formats. Then Asterisk can use the appropriate one for the channel without transcoding.
On my systems I have MoH and sounds installed in wav, ulaw, alaw, gsm and g729.
They will also sound better than transcoding from the gsm versions.
Cheers Tony
Thanks for the suggestion Tony,
I installed each codec for MoH, core sounds, and extra sound packages. Unfortunately the tests produce the same results.
[Sep 1 20:36:45] ERROR[10081][C-00007fe5]: frame.c:343 ast_frdup: FRACK!, Failed assertion Excessive refcount 100000 reached on ao2 object 0x20380b0 (
continuously for a while followed by a
[Sep 1 20:36:46] WARNING[7761][C-0000770d]: taskprocessor.c:888 taskprocessor_push: The ‘subp:PJSIP/sipp-00000020’ task processor queue reached 500 scheduled tasks.
Then this time Asterisk actually crashed. 🙁
——=_NextPart_001_0073_01D32341.E9678B80
Content-Type: text/plain;
charset=”us-ascii”
Content-Transfer-Encoding: 7bit
I had that problem before – I believe “task processor queue reached 500
scheduled tasks” crashing means your CDR records (queue) are being written as the call ends, and if you had many thousands of entries being written to disk it crashes asterisk (each ring to one phone is an entry, so it goes up fast – for example 10 busy phones, with a between-ring delay of 1
second means every second there are 10 entries being put in memory)
I was using a MySQL CDR, but I had left the “CSV” type of CDR on. I
removed/disabled the CSV CDR module, kept on the SQL CDR only and things have been working fine ever since.
Mike
From: asterisk-users-bounces@lists.digium.com
[mailto:asterisk-users-bounces@lists.digium.com] approached with this task I mentioned as much. That is out of my hands at the moment unless it as well. anyone have any advice on what that could be or because of transcoding? If so would it help to change files I am using are gsm.
You will find it less taxing on the server if you have MoH files and sounds files available in all the possible native formats. Then Asterisk can use the appropriate one for the channel without transcoding.
On my systems I have MoH and sounds installed in wav, ulaw, alaw, gsm and g729.
They will also sound better than transcoding from the gsm versions.
Cheers Tony
–_000_CY4PR2201MB14642220BB9A07CA7AA5EE6BA8960CY4PR2201MB1464_
Content-Type: text/plain; charset=”Windows-1252″
Content-Transfer-Encoding: quoted-printable
Thank you for the response Mike,
I did run into a CDR bottleneck as well and have already disabled it,
Module Description Use Count Status Support Level
0 modules loaded
# grep enable= /etc/asterisk/cdr.conf enable=no
At this point I’m really just not sure what the current bottleneck is and how to prevent the tasks for pooling. I expected that the CPU would cap out before this occurred. I do feel like there must be something I’m missing but just can’t to it.
Any further suggestions are very welcome.
Thanks Joseph