Lost Outgoing SIP Packets
Hi list!
I have a problem where SIP packets sent by Asterisk do not hit the wire, and
I don’t know what could cause this.
I’m running Asterisk 1.8.28_cert5 with full SIP debug. At the same time, I’m
doing a tcpdump of the traffic on the network interface. I can see in the SIP
debug log that asterisk is sending packets. Most of the time, I can see
those packets in the tcpdump, as you would expect. However, sometimes Asterisk sends a packet that *does not show up* in the tcpdump. Asterisk then does several retransmits (that also don’t show up). The next packet that is not a retransmit does show up again.
This causes Asterisk to log the peer it was sending packets to temporarily
as Lagged or unreachable.
There is no outgoing firewall on this box.
Could anyone give me some pointers where to look?
If Asterisk logs “VERBOSE[13019] chan_sip.c: Reliably Transmitting (NAT) to
x.x.x.x:” you would expect to see that packet in a tcpdump trace, right?
What could cause this not to be so? Are there network statistics I could
look at? Is there a counter in /proc or /sys for problems with sending
packets? Anything?
If more information is necessary please do let me know.
Thanks a lot in advance,
Roel
13 thoughts on - Lost Outgoing SIP Packets
The tcpdump that you are running is on the Asterisk box or via port mirroring?
Regards,
Dovid
—–Original Message—
Dovid Bender writes:
It’s on the asterisk box itself.
I’ve already replaced the network card – no change.
Thanks,
Roel
Just guessing I would verify that the out of : iptables -L -nv Shows no dropped packets, try disabling selinux as well as look at the limits of the asterisk pid (cat /proc//limits). I know the defualt for rhel is 1024 which was never enough for us.
Regards,
Dovid
—–Original Message—
Dovid Bender writes:
Thanks for the hints. Selinux is disabled, there is no outgoing firewall
(anymore) on this box, and the limits seems fine: 200637 open files.
Ifconfig output looks like this:
root@communiceer:~# ifconfig eth1
eth1 Link encap:Ethernet HWaddr b4:99:ba:a9:3e:e5
inet addr:x.x.x.x Bcast:x.x.x.127 Mask:255.255.255.128
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5967421 errors:0 dropped:21425 overruns:0 frame:0
TX packets:6085933 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1223605260 (1.1 GiB) TX bytes:2096293903 (1.9 GiB)
Interrupt:17 Memory:fbfe0000-fc000000
I was thinking maybe there’s a problem with the transmit queue, but 1000 is
the default value for txqueuelen and I have never needed to change it.
I have the default queueing discipline:
root@communiceer:~# tc qdisc show dev eth1
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
The output of ethtool also looks good:
root@communiceer:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
And the nic stats also look good:
root@communiceer:~# ethtool -S eth1
NIC statistics:
rx_packets: 6071960
tx_packets: 6189424
rx_bytes: 1244435132
tx_bytes: 2117335817
rx_broadcast: 293751
tx_broadcast: 193
rx_multicast: 29827
tx_multicast: 0
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 29827
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 37559
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_csum_offload_good: 3447739
rx_csum_offload_errors: 2
rx_header_split: 0
alloc_rx_buff_failed: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
rx_dma_failed: 0
tx_dma_failed: 0
rx_hwtstamp_cleared: 0
uncorr_ecc_errors: 0
corr_ecc_errors: 0
tx_hwtstamp_timeouts: 0
So I really don’t know where to look elsewhere..
Thanks,
Roel
Hi Roel
Just guessing: do you have conntrack enabled?
If not, “modprobe nf_conntrack_netlink” (you can remove it and its dependencies later)
What are the outputs of sysctl net.netfilter.nf_conntrack_count and sysctl net.netfilter.nf_conntrack_max
when the problem shows up?
cheers
Ethy
Ethy H. Brito writes:
Yes, I do.
Hm, good one. I’ll monitor them. Currently the values are:
root@communiceer:~# sysctl net.netfilter.nf_conntrack_max net.netfilter.nf_conntrack_max = 65536
root@communiceer:~# sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_count = 245
I’ll report back if the count comes anywhere near maximum.
Are there any other limits like these that might play a role here?
Thanks,
Roel
Hello;
I ran into a similar problem not long ago. Always try the easiest (and cheapest) solutions first. My solution was to replace the Ethernet cable and then to change the network switch port. Did the trick. Switches with errors tend to be due to faulty switch ports. Regards;
John V.
—–Original Message—
I think nf_conntrack_count is the one you should be aware of. But you can play with the timeouts in net.ipv4.netfilter.ip_conntrack_* if you experience any problems.
Cheers
Ethy
Did you notice this ……………………….^^^^^ value???
Should not be a problem since you are complaining abou TX packets, not RX, but…
does dmesg say anything about this?
Cheers
Ethy
Ethy H. Brito writes:
Yes, I did. But I assumed (hmm) that this was caused because this server is
not IPv6-enabled. See also here: http://stackoverflow.com/a/30703716/3172389
Nope, nothing at all.
But what I can do is set the interface into promiscuous mode with tcpdump,
then there should be no dropped packets at all, I think. I’ll check to make
sure.
Thanks for the heads up, and thanks for thinking with me everyone!
Cheers,
Roel
Doesn’t tcpdump ‘see’ packets before iptables?
A word of advice. I have been stumped before with tcpdump not showing all packets. It turns out that the kernel can indeed prevent packets from being captured if it is overwhelmed. The first sign of trouble is when you hit control-C when finishing up tcpdump and get messages like
‘kernel dropped X packets’. It turns out tcpdump by default tries to do reverse DNS queries on all IPs and simply does not have enough time under high packet loads. The solution is generally to run “tcpdump -nn”
to avoid DNS lookups.
Roel,
Just another thought bouncing around… Your ifconfig output was specific to eth1. Is there an eth0 too? Is there a chance packets are heading to that other interface when they shouldn’t be? Running a second tcpdump on eth0 at the same time should at least disprove the theory quickly.
Pete