Thanks for replying.
I suspected something like that, though repeatedly running
lsof | wc -l
Always stays quite low – 100 000 open files, which is still 8 times less than the system maximum as confirmed by running ulimit -n
I also note that this number will increase to about 125 000 but never go higher than that, then, as calls hang up, decreate again – during times when the CLI is spammed with 100s of “broken pipe” errors due to insuffiecient file descriptors, this number never reaches beyond 125 000 out of the available 800 000 open files.
If I grep by asterisk on the output of lsof the few thousand lines I have looked at all seem to indicate legitimate uses – there are at least two files for each conversation in progress (I assume for inward and outward RTP) plus one for each file being mixmonitored (which also seems logical)
and also number-of-active-calls connections to res_timing_dahdi – which all looks correct…
Ok, I will have to consider that. The thing is the problem is not consistent
– I can (for example) run 60 calls, with no problems and no reported failures in opening files, then calls will -decrease- to about 40 and then later spike to 70, but around 50 calls I get the errors coming up thousands of times in the CLI, then suddenly stop as the calls -increase- which doesn’t make sense. But this kind of behaviour does seem consistent with a possible leak.
I have now ran
/usr/bin/prlimit –pid `pidof asterisk`
and I have noticed that even though I have 800 000 files specified, the ACTUAL limit in place on Asterisk for numbers of files is only 1024?!
# prlimit –pid `pidof asterisk`
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes CORE max core file size unlimited unlimited blocks CPU CPU time unlimited unlimited seconds DATA max data size unlimited unlimited bytes FSIZE max file size unlimited unlimited blocks LOCKS max number of file locks held unlimited unlimited MEMLOCK max locked-in-memory address space 65536 65536 bytes MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 1024 4096
NPROC max number of processes 30861 30861
RSS max resident set size unlimited unlimited pages RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs SIGPENDING max number of pending signals 30861 30861
STACK max stack size 8388608 unlimited bytes
Accordingly I have put this into a cronjob ran each minute:
prlimit –pid `pidof asterisk` –nofilex6000:786000
to try and force the running binary to keep a high file limit (sources say to keep it less than the ACTUAL system file limit, in my case 800 000 files)
on the live Asterisk process.
I’ll see if this maybe helps – the above runs via cron each minute.
So it appears for some reason somehow the live running asterisk process
“loses track” of how many open files it may have, or when it starts it somehow does not start with the correct number of maximum open files, as set in the system / kernel config?
Anyway, thank you for replying, I’ll monitor this new “Cronjob fixup” I’m trying and see if it helps.
No wonder it is complaining about running out of file handles if it ACTUALLY
was only using 1024!