yesterday I have run into a show stopper problem with Asterisk 13.6 / PJSIP 2.4.5.
My users couldn’t use their phones during the whole day, nor incoming nor outgoing calls were working. This has been mysterious because Asterisk used to work stably in the past weeks. Analysis lead to the following conclusion:
In the early morning, the provider (trunk) had problems and returned some 503 messages upon re-registration requests. Asterisk then retried the re-registration one or two times per registration object *although* max_retries in the registration configuration had not been set, i.e. was at the default value of 10. Of course, Asterisk stopping re-registration attempts after one or two retries is absolutely not acceptable (think of if a user must make an emergency call), so I absolutely have to find out the reason for that (could be a misconfiguration on my side).
Thus, I have the following questions:
1) Are there common reasons for Asterisk not retrying the re-registration according to max_retries, but giving up after 1 or 2 times?
2) Could there be a header in the response to the re-registration request which tells Asterisk to stop retrying the re-registration completely (something like the header which tells after which time interval the re-registration should be tried again)? If yes, how could we override that header?
3) Could it be that Asterisk stops trying the re-registration if it gets *no* response at all, i.e. if the registrar is down completely and not even replies with an error code?
4) Is there a method to override the re-registration time interval in the response returned by the registrar? I can understand that providers want to protect their servers and don’t want clients to try re-registration every second, but in my case, Asterisk had to wait up to 3 minutes before the next re-registration attempt, and that by far is not acceptable, notably because the provider in its infinite wisdom could come to the idea to increase that interval further (e.g. 1 hour). So I really hope that it is possible to make Asterisk / PJSIP attempt re-registering according to retry_interval and not according to what the registrar wants.
5) Does Asterisk cache DNS results if the DNS manager is disabled? Many providers return a different IP address for the same host name every time for load balancing. If one of the registering servers is down, the provider (hopefully) knows about that and doesn’t return that server’s IP address in DNS replies any more. This only helps if Asterisk does not cache the former DNS replies. If it does, it infinitely will try to re-register at a server which is down. When researching the problem, I have found a statement that Asterisk just uses the bind libraries for DNS queries, but that doesn’t answer if and how long Asterisk caches DNS replies.
I really hope that some experienced Asterisk user or an expert could take the time to answer these questions.
I know that the Asterisk people always try to educate the users to file bug reports, backtraces and so on. But in this case, this is not possible for me since I am absolutely sure that I can’t get the respective provider to return 503 errors intentionally, and I didn’t see any problems since I restarted Asterisk to make it work again (yesterday in the evening).
Thank you very much for any answer, hint and bit of background information.