RTCP + Stasis Causing High Memory Consumption
Hello Asterisk list,
I’ve facing a memory allocation issue that happens occasionally but on a consistent basis. The problem happens as follow, suddenly Asterisk starts consuming a lot of memory, in a rate of more than 1GB per hour. Kernel will eventually kill it via the OOM killer when memory is really exausted… This situation does not generate backtrace because Asterisk is responsive during the whole time and can be stopped/started. After compiling Asterisk with the MALLOC_DEBUG flag, when the problem happened we could see this output with the most consuming files, using
“memory show summary” command:
# cat memory_show_summary_201711101144.log | sort -nr | head
3060904798 bytes allocated (431496 in caches) in 16185877 allocations
1467180225 bytes in 1647238 allocations in file stasis_channels.c
1217035109 bytes in 10229320 allocations in file json.c
240064732 bytes in 1765287 allocations in file stasis_message.c
56402000 bytes in 1762250 allocations in file taskprocessor.c
26125344 bytes in 203171 allocations in file rtp_engine.c
17308827 bytes in 307848 allocations in file stasis_cache.c
9548128 bytes in 35482 allocations in file stasis_bridges.c
3923172 bytes in 92169 allocations in file res_rtp_asterisk.c
3099771 bytes in 29185 allocations in file strings.c
Next minute it was already like this:
# cat memory_show_summary_201711101145.log | sort -nr | head
3091009032 bytes allocated (375561 in caches) in 16345842 allocations
1482476997 bytes in 1663899 allocations in file stasis_channels.c
1228978956 bytes in 10329516 allocations in file json.c
242483220 bytes in 1783070 allocations in file stasis_message.c
57063120 bytes in 1782910 allocations in file taskprocessor.c
26354360 bytes in 205090 allocations in file rtp_engine.c
17481272 bytes in 310935 allocations in file stasis_cache.c
9628576 bytes in 35782 allocations in file stasis_bridges.c
3828232 bytes in 93040 allocations in file res_rtp_asterisk.c
3125118 bytes in 29427 allocations in file strings.c
So as we can see, total allocations increased 159965 or 30MB in one minute…… A regular “memory show summary” during the day would be like this:
# cat memory_show_summary_201711140400.log | sort -nr | head
30595537 bytes allocated (406915 in caches) in 77906 allocations
11021007 bytes in 5081 allocations in file stasis_channels.c
3825007 bytes in 12643 allocations in file cdr.c
1855344 bytes in 15837 allocations in file pbx.c
1766744 bytes in 183 allocations in file translate.c
1467681 bytes in 1451 allocations in file chan_sip.c
1400000 bytes in 280 allocations in file res_rtp_asterisk.c
980474 bytes in 4740 allocations in file xmldoc.c
895369 bytes in 167 allocations in file file.c
800900 bytes in 2397 allocations in file channel.c
280728 bytes in 613 allocations in file rtp_engine.c
243147 bytes in 2556 allocations in file stasis.c
130066 bytes in 1082 allocations in file stasis_cache.c
24928 bytes in 82 allocations in file stasis_bridges.c
51796 bytes in 486 allocations in file stasis_message.c
We also tried to get a “memory show allocations” when the problem was happening. Unsurprisingly it halted Asterisk and after we cancelled it, it had generated a 1GB file. Both memory logs hinted us to some problem in the RTP module, specifically in the RTCP handling. After investigating the code, looks like for some reason, the res_rtp_asterisk.c (line 4116) when doing the ast_rtcp_read() ended up entering in a loop, causing lots of allocations in the rtp_engine.c (line 2047 – ast_rtp_publish_rtcp_message). Also apparently, another loop (or maybe the same, not sure…) happened calling ast_channel_publish_blob() because we got lots of create_channel_blob_message and ast_channel_snapshot_create allocations too. Json, taskprocessor, stasis_message and stasis_cache allocations seems to be side effect of so many stasis messages being created and published. We could check memory logs 3 times for this problem and they are similar so it seems that a bug exists somewhere leading to this problem. So anyone has any idea of what could be happening or if it may be related to some known bug?
We are running Asterisk 13.6.0 in CentOS 6.6.
Thanks very much!
Best regards, Patrick Wakano
2 thoughts on - RTCP + Stasis Causing High Memory Consumption
There have been a lot of bug fixes in this area since Asterisk 13.6.0. I’d highly recommend upgrading your version of Asterisk to the latest 13.x to see if it resolves the issue.
If not, and you don’t need the RTCP related events in either AMI or ARI, you can permanently disable them in stasis.conf:
[declined_message_types]
decline=ast_rtp_rtcp_sent_type
decline=ast_rtp_rtcp_received_type
While that won’t completely remove all processing of RTCP related information, it will dramatically reduce the amount of work Asterisk does when those messages are generated.
If that doesn’t fix it, then you may have some form of malformed RTCP
packet that is causing Asterisk to think that it has a slew of SR/RR
reports to generate. You may want to look at the RTCP information in wireshark to determine how many RR/SR reports are being generated in the packets.
Thank you very much for the response Matthew!
I’ll try the stasis configuration change first. I can’t say if a malformed RTCP packet triggered the issue, but I did actually checked a tcpdump while problem was happening and the amount of RTCP packets and AMI events were reasonably normal…. Apparently it is an internal only problem, not actually outputting abnormal traffic in the network….
Best regards!
Patrick Wakano