Lost Outgoing SIP Packets

Home » Asterisk Users » Lost Outgoing SIP Packets
Asterisk Users 13 Comments

Hi list!

I have a problem where SIP packets sent by Asterisk do not hit the wire, and
I don’t know what could cause this.

I’m running Asterisk 1.8.28_cert5 with full SIP debug. At the same time, I’m
doing a tcpdump of the traffic on the network interface. I can see in the SIP
debug log that asterisk is sending packets. Most of the time, I can see
those packets in the tcpdump, as you would expect. However, sometimes Asterisk sends a packet that *does not show up* in the tcpdump. Asterisk then does several retransmits (that also don’t show up). The next packet that is not a retransmit does show up again.

This causes Asterisk to log the peer it was sending packets to temporarily
as Lagged or unreachable.

There is no outgoing firewall on this box.

Could anyone give me some pointers where to look?

If Asterisk logs “VERBOSE[13019] chan_sip.c: Reliably Transmitting (NAT) to
x.x.x.x:” you would expect to see that packet in a tcpdump trace, right?
What could cause this not to be so? Are there network statistics I could
look at? Is there a counter in /proc or /sys for problems with sending
packets? Anything?

If more information is necessary please do let me know.

Thanks a lot in advance,

Roel

13 thoughts on - Lost Outgoing SIP Packets

  • The tcpdump that you are running is on the Asterisk box or via port mirroring?

    Regards,

    Dovid

    —–Original Message—

  • Dovid Bender writes:

    It’s on the asterisk box itself.

    I’ve already replaced the network card – no change.

    Thanks,

    Roel

  • Just guessing I would verify that the out of : iptables -L -nv Shows no dropped packets, try disabling selinux as well as look at the limits of the asterisk pid (cat /proc//limits). I know the defualt for rhel is 1024 which was never enough for us.

    Regards,

    Dovid

    —–Original Message—

  • Dovid Bender writes:

    Thanks for the hints. Selinux is disabled, there is no outgoing firewall
    (anymore) on this box, and the limits seems fine: 200637 open files.

    Ifconfig output looks like this:

    root@communiceer:~# ifconfig eth1
    eth1 Link encap:Ethernet HWaddr b4:99:ba:a9:3e:e5
    inet addr:x.x.x.x Bcast:x.x.x.127 Mask:255.255.255.128
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:5967421 errors:0 dropped:21425 overruns:0 frame:0
    TX packets:6085933 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:1223605260 (1.1 GiB) TX bytes:2096293903 (1.9 GiB)
    Interrupt:17 Memory:fbfe0000-fc000000

    I was thinking maybe there’s a problem with the transmit queue, but 1000 is
    the default value for txqueuelen and I have never needed to change it.

    I have the default queueing discipline:

    root@communiceer:~# tc qdisc show dev eth1
    qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

    The output of ethtool also looks good:

    root@communiceer:~# ethtool eth1
    Settings for eth1:
    Supported ports: [ TP ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: on
    Supports Wake-on: pumbg
    Wake-on: g
    Current message level: 0x00000007 (7)
    drv probe link
    Link detected: yes

    And the nic stats also look good:

    root@communiceer:~# ethtool -S eth1
    NIC statistics:
    rx_packets: 6071960
    tx_packets: 6189424
    rx_bytes: 1244435132
    tx_bytes: 2117335817
    rx_broadcast: 293751
    tx_broadcast: 193
    rx_multicast: 29827
    tx_multicast: 0
    rx_errors: 0
    tx_errors: 0
    tx_dropped: 0
    multicast: 29827
    collisions: 0
    rx_length_errors: 0
    rx_over_errors: 0
    rx_crc_errors: 0
    rx_frame_errors: 0
    rx_no_buffer_count: 0
    rx_missed_errors: 0
    tx_aborted_errors: 0
    tx_carrier_errors: 0
    tx_fifo_errors: 0
    tx_heartbeat_errors: 0
    tx_window_errors: 0
    tx_abort_late_coll: 0
    tx_deferred_ok: 0
    tx_single_coll_ok: 0
    tx_multi_coll_ok: 0
    tx_timeout_count: 0
    tx_restart_queue: 0
    rx_long_length_errors: 0
    rx_short_length_errors: 0
    rx_align_errors: 0
    tx_tcp_seg_good: 37559
    tx_tcp_seg_failed: 0
    rx_flow_control_xon: 0
    rx_flow_control_xoff: 0
    tx_flow_control_xon: 0
    tx_flow_control_xoff: 0
    rx_csum_offload_good: 3447739
    rx_csum_offload_errors: 2
    rx_header_split: 0
    alloc_rx_buff_failed: 0
    tx_smbus: 0
    rx_smbus: 0
    dropped_smbus: 0
    rx_dma_failed: 0
    tx_dma_failed: 0
    rx_hwtstamp_cleared: 0
    uncorr_ecc_errors: 0
    corr_ecc_errors: 0
    tx_hwtstamp_timeouts: 0

    So I really don’t know where to look elsewhere..

    Thanks,

    Roel

  • Hi Roel

    Just guessing: do you have conntrack enabled?
    If not, “modprobe nf_conntrack_netlink” (you can remove it and its dependencies later)

    What are the outputs of sysctl net.netfilter.nf_conntrack_count and sysctl net.netfilter.nf_conntrack_max

    when the problem shows up?

    cheers

    Ethy

  • Ethy H. Brito writes:

    Yes, I do.

    Hm, good one. I’ll monitor them. Currently the values are:

    root@communiceer:~# sysctl net.netfilter.nf_conntrack_max net.netfilter.nf_conntrack_max = 65536

    root@communiceer:~# sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_count = 245

    I’ll report back if the count comes anywhere near maximum.

    Are there any other limits like these that might play a role here?

    Thanks,

    Roel

  • Hello;
    I ran into a similar problem not long ago. Always try the easiest (and cheapest) solutions first. My solution was to replace the Ethernet cable and then to change the network switch port. Did the trick. Switches with errors tend to be due to faulty switch ports. Regards;
    John V.

    —–Original Message—

  • I think nf_conntrack_count is the one you should be aware of. But you can play with the timeouts in net.ipv4.netfilter.ip_conntrack_* if you experience any problems.

    Cheers

    Ethy

  • Did you notice this ……………………….^^^^^ value???

    Should not be a problem since you are complaining abou TX packets, not RX, but…

    does dmesg say anything about this?

    Cheers

    Ethy

  • Ethy H. Brito writes:

    Yes, I did. But I assumed (hmm) that this was caused because this server is
    not IPv6-enabled. See also here: http://stackoverflow.com/a/30703716/3172389

    Nope, nothing at all.

    But what I can do is set the interface into promiscuous mode with tcpdump,
    then there should be no dropped packets at all, I think. I’ll check to make
    sure.

    Thanks for the heads up, and thanks for thinking with me everyone!

    Cheers,

    Roel

  • A word of advice. I have been stumped before with tcpdump not showing all packets. It turns out that the kernel can indeed prevent packets from being captured if it is overwhelmed. The first sign of trouble is when you hit control-C when finishing up tcpdump and get messages like
    ‘kernel dropped X packets’. It turns out tcpdump by default tries to do reverse DNS queries on all IPs and simply does not have enough time under high packet loads. The solution is generally to run “tcpdump -nn”
    to avoid DNS lookups.

  • Roel,

    Just another thought bouncing around… Your ifconfig output was specific to eth1. Is there an eth0 too? Is there a chance packets are heading to that other interface when they shouldn’t be? Running a second tcpdump on eth0 at the same time should at least disprove the theory quickly.

    Pete