How To Read IRQs And Timing Slips Values

Home » Asterisk Users » How To Read IRQs And Timing Slips Values
Asterisk Users 16 Comments

Hi,

On a Asterisk 1.8.12 system working OK for months (>100k calls proceed), users are complaining for bad audio.

My setup is:
PSTN <--E1/PRI ---> Asterisk <--- E1/PRI---> Siemens HiPath <---E1/PRI ---> PSTN

asterisk -rx “dahdi show version”
DAHDI Version: SVN-trunk-r10414 Echo Canceller: HWEC

asterisk -rx “pri show version”
libpri version: 1.4.12

A quick glance at Asterisk logs shows lines like this:
[2014-01-09 17:19:34] NOTICE[26034]: chan_dahdi.c:3099
my_handle_dchan_exception: PRI got event: HDLC Abort (6) on D-channel of span 1
[2014-01-09 17:19:35] NOTICE[26035]: chan_dahdi.c:3099
my_handle_dchan_exception: PRI got event: HDLC Abort (6) on D-channel of span 2
[2014-01-09 17:19:49] NOTICE[26035]: chan_dahdi.c:3099
my_handle_dchan_exception: PRI got event: HDLC Abort (6) on D-channel of span 2

I read an old thread inviting an admin to check for shared IRQs and timing slips.

My questions are:

1. cat /proc/interrupts ‘s output is:
# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4
CPU5 CPU6 CPU7
0: 90147 0 0 0 0
0 0 0 IO-APIC-edge timer
1: 2 0 0 0 0
0 0 0 IO-APIC-edge i8042
8: 0 0 1 0 0
0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 0
0 0 0 IO-APIC-fasteoi acpi
12: 4 0 0 0 0
0 0 0 IO-APIC-edge i8042
14: 93 0 0 0 0
0 0 0 IO-APIC-edge ata_piix
15: 0 0 0 0 0
0 0 0 IO-APIC-edge ata_piix
16: 3378646209 3378695076 3378691115 3378697362 3378691116 3378706831
3378710635 3378702358 IO-APIC-fasteoi wct2xxp

Can I positively conclude that my Dahdi PRI board IS NOT sharing IRQ (which is good) ?

2. What would you suggest reading the following output ?

cat /proc/dahdi/2
Span 2: TE2/0/2 “T2XXP (PCI) Card 0 Span 2” (MASTER) HDB3/CCS
Timing slips: 175319

32 TE2/0/2/1 Clear (In use) (EC: VPMOCT064 – INACTIVE)
33 TE2/0/2/2 Clear (In use) (EC: VPMOCT064 – INACTIVE)
34 TE2/0/2/3 Clear (In use) (EC: VPMOCT064 – INACTIVE)
35 TE2/0/2/4 Clear (In use) (EC: VPMOCT064 – INACTIVE)
36 TE2/0/2/5 Clear (In use) (EC: VPMOCT064 – INACTIVE)
37 TE2/0/2/6 Clear (In use) (EC: VPMOCT064 – INACTIVE)
38 TE2/0/2/7 Clear (In use) (EC: VPMOCT064 – INACTIVE)
39 TE2/0/2/8 Clear (In use) (EC: VPMOCT064 – INACTIVE)
40 TE2/0/2/9 Clear (In use) (EC: VPMOCT064 – INACTIVE)
41 TE2/0/2/10 Clear (In use) (EC: VPMOCT064 – INACTIVE)
42 TE2/0/2/11 Clear (In use) (EC: VPMOCT064 – INACTIVE)
43 TE2/0/2/12 Clear (In use) (EC: VPMOCT064 – INACTIVE)
44 TE2/0/2/13 Clear (In use) (EC: VPMOCT064 – INACTIVE)
45 TE2/0/2/14 Clear (In use) (EC: VPMOCT064 – INACTIVE)
46 TE2/0/2/15 Clear (In use) (EC: VPMOCT064 – INACTIVE)
47 TE2/0/2/16 HDLCFCS (In use) (EC: VPMOCT064 – INACTIVE)
48 TE2/0/2/17 Clear (In use) (EC: VPMOCT064 – INACTIVE)
49 TE2/0/2/18 Clear (In use) (EC: VPMOCT064 – INACTIVE)
50 TE2/0/2/19 Clear (In use) (EC: VPMOCT064 – INACTIVE)
51 TE2/0/2/20 Clear (In use) (EC: VPMOCT064 – INACTIVE)
52 TE2/0/2/21 Clear (In use) (EC: VPMOCT064 – INACTIVE)
53 TE2/0/2/22 Clear (In use) (EC: VPMOCT064 – INACTIVE)
54 TE2/0/2/23 Clear (In use) (EC: VPMOCT064 – INACTIVE)
55 TE2/0/2/24 Clear (In use) (EC: VPMOCT064 – INACTIVE)
56 TE2/0/2/25 Clear (In use) (EC: VPMOCT064 – INACTIVE)
57 TE2/0/2/26 Clear (In use) (EC: VPMOCT064 – INACTIVE)
58 TE2/0/2/27 Clear (In use) (EC: VPMOCT064 – INACTIVE)
59 TE2/0/2/28 Clear (In use) (EC: VPMOCT064 – INACTIVE)
60 TE2/0/2/29 Clear (In use) (EC: VPMOCT064 – INACTIVE)
61 TE2/0/2/30 Clear (In use) (EC: VPMOCT064 – INACTIVE)
62 TE2/0/2/31 Clear (In use) (EC: VPMOCT064 – INACTIVE)

3. As shown above, my box has two connections with PSTN (same provider for both): one direct, one through an HiPath PBX. How can I double check timing slips don’t come from “inconsistency between both clock sources” ?
My first thought would be to unplug the link between Asterisk and HiPath and compare two /pro/dahddi/1 outputs. Thoughts ?

Regards

16 thoughts on - How To Read IRQs And Timing Slips Values

  • Yes, there is not IRQ sharing going on.

    So span 2 has slips, which could explain the HDLC aborts you saw previously.

    So you’ve probably nailed it here. It *seems* like the HiPath PBX is regenerating the clock on the downstream port based on the other information.

    Yes, You can monitor the timing slips to see the rate at which they occur, then connect just one span direct and make sure you don’t have slips, then one span directly to the HiPath and make sure you don’t have slips.

    If there isn’t anyway to configure HiPath to provide the exact same clock as the provider to any downstream devices, then you will need to use two single port cards in order to sync to two different clocks.

    Cheers, Shaun

  • 2014/1/9 Shaun Ruffell

    With a single span directly connected to PSTN I’m still getting timing slips (140 slips/hour). Would you agree to qualify this rate as excessive ?

    Given these figures, may I also exclude an hardware failure inside my card or on the hosting machine ?
    In other words, how to detect a timing slip, Dahdi must use some inner clock as a reference, doesn’t it ? Could this “inner clock” be presently broken ?

  • Yes, this is excessive.

    You’ve configured the card to recover timing from the provider? If so, do your slips follow the actual cable that you were using to connect to provider or PBX?

  • 2014/1/10 Shaun Ruffell

    I’m not sure but I don’t think so as I’ve just configured the card with:

    span=1,1,0,ccs,hdb3
    bchan=1-15,17-31
    dchan
    echocanceller=oslec,1-15,17-31

    span=2,2,0,ccs,hdb3
    bchan2-46,48-62
    dchanG
    echocanceller=oslec,32-46,48-62

    Span1 is the one direct to provider equipement. Span2 is thh one that was connected to HiPath and which is simply unplugged

  • That looks correct. You might want to check your cables next. Do you only get timing slips when connected to the provider, or to the HiPath as well?

  • 2014/1/10 Shaun Ruffell

    For a couple of days now, I’m only connected to the provider. Just in case, I commented out in system.conf and dahdi-channels.conf, any reference to “the HiPath span” and ran a “dahdi restart now”:
    unfortunately, I’m still observing timing slips at a rate of 1220
    slips/hour.

    I have a spare TE4xx board. Shall I replace current TE2xx with this spare and see if things improve or would you advise an other check ?

    Regards

  • If you have another board, yes, you could try. But I would recommend checking all your cables, etc. Also, while highly unlikely, I’ve heard of cases in the past where some smaller providers were expecting to source timing from customer premise PBX (since they were acting as a SIP gateway on the backend).

    Your provider does expect you to source their clock?

  • Ask your carrier to test the circuit. Often HDLC errors, especially with modern cards, are caused by a dirty T-1 not a PBX or card issue.

    —–Original Message—

  • I basically had the same issue as you for one of my sites. I tried everything under the sun to figure it out, change cables, loop back test, change out hardware, clocking, etc.

    In the end I had to upgrade dahdi to 2.7+ and the issue went away. Never did figure out the real problem, but to this day I think the issue was a delay on the frames from the PCI bus into the software.

    All that to say, try upgrading DAHDI and see what happens.

  • I had a late phone call yesterday with provider’s level-1 support team.

    As strange as it seems, the guy said he could “read clock slips in his box logs though his box is supposed to provide clock and not to use my PBX’s one”. I’m 100% sure my PBX is configured to use provider’s clock (but I won’t swear my PBX is currently using provider’s clock).

    I think I will insert a new Patton box between my provider’s one and my PBX
    to see what happens. The setup would be:
    PSTN –> provider’s equipment <----> Patton GW <----> Asterisk

    Then I’ll also replace the card inside my PBX to see if things are changing.

    2014/1/14 Paul Belanger

  • I have had to power the server down, UNPLUG the power, leave unplugged for 4 minutes, power up. I had a T1 timing issue this procedure fixed.

    Adrian

  • As far as Olivier’s concern, I still vote there is some physical cabling issue that is causing problems.

    However, just for posterity, in my experience if HDLC aborts are occuring and there are timing slips, it does not have anything to do with the card / host communication, but rather the issue has more to do with the framer and connection to provider.

    This is because the timing slips are reported directly by the framer and that doesn’t depend on the host communication.

    Just FYI…

  • I could replace the card this morning and the timing slips disappeared.

    Given Adrian’s testimony, that doesn’t mean the card is faulty but as the card is now off service, I’m really eagger to investigate further.

    At the moment, I can’t insert this card and test it again in my lab but I’ll certainly do it within a couple of hours (and report here). Stay tuned …

    2014/1/14 Shaun Ruffell

  • Follow up on this thread:

    Today, I could at last insert the TE220 board into an other machine in my lab.

    1. For the moment, I plugged the card into an available PCIe x1 slot and checked IRQs with:

    # dmesg

    [ 42.969568] dahdi: Version: SVN-trunk-r10414
    [ 43.281832] wct4xxp 0000:04:08.0: PCI INT A -> GSI 17 (level, low) ->
    IRQ 17
    [ 43.281855] wct4xxp 0000:04:08.0: 5th gen card with initial latency of 2
    and 1 ms per IRQ
    [ 43.281861] wct4xxp 0000:04:08.0: Firmware Version: c01a016d
    [ 43.284411] wct4xxp 0000:04:08.0: FALC Framer Version: 3.1
    [ 43.284420] IRQ 17/wct2xxp: IRQF_DISABLED is not guaranteed on shared IRQs
    [ 43.284465] wct4xxp 0000:04:08.0: Found a Wildcard: Wildcard TE220 (5th Gen)
    [ 43.306504] wct4xxp 0000:04:08.0: firmware: requesting dahdi-fw-oct6114-064.bin
    [ 43.568475] VPM450: echo cancellation for 64 channels
    [ 48.722467] wct4xxp 0000:04:08.0: VPM450: hardware DTMF disabled.
    [ 48.722471] wct4xxp 0000:04:08.0: VPM450: Present and operational servicing 2 span(s)
    [ 50.304512] wct4xxp 0000:04:08.0: TE2XXP: Span 1 configured for CCS/HDB3
    [ 50.304543] wct4xxp 0000:04:08.0: RCLK source set to span 1
    [ 50.304547] wct4xxp 0000:04:08.0: Recovered timing mode, RCLK set to span 1
    [ 50.304615] wct4xxp 0000:04:08.0: SPAN 1: Primary Sync Source

    # cat /proc/interrupts
    CPU0 CPU1
    0: 46 24 IO-APIC-edge timer
    1: 0 2 IO-APIC-edge i8042
    7: 1 0 IO-APIC-edge parport0
    8: 0 1 IO-APIC-edge rtc0
    9: 0 0 IO-APIC-fasteoi acpi
    12: 0 6 IO-APIC-edge i8042
    14: 0 0 IO-APIC-edge pata_atiixp
    15: 0 0 IO-APIC-edge pata_atiixp
    16: 1 339 IO-APIC-fasteoi xhci_hcd:usb3, ohci_hcd:usb4, ohci_hcd:usb5, HDA Intel
    17: 2216 7235841 IO-APIC-fasteoi ehci_hcd:usb1, wct2xxp

    Does it qualify as “incompatible with Dahdi because of shared IRQ” or can I
    get along and keep testing on this machine ?

    2. I plugged a straight patch cord into one E1 port and the other cord’s plug into the small socket provided with card.
    (I’m referring here to a female RJ45 socket mounted on a 1cm x 2cm PCB)

    The port light turns green and dahdi_tool says status is OK.

    At the same time, I’m seeing timing slips as with:
    # cat /proc/dahdi/1
    Span 1: TE2/0/1 “T2XXP (PCI) Card 0 Span 1” (MASTER) HDB3/CCS ClockSource
    Timing slips: 75

    The rate is 3 slips/minute. Has this figure any meaning (I don’t have any other card to compare with) ?

    3. When I’m plugging the same E1 port into a Smartnode E1 gateway port
    (then using an E1 crossover cable), dahdi_tool tells status is OK
    but asterisk logs are full of lines such as (~ 20 occurences per second)):
    [2014-02-03 13:46:04] NOTICE[2076]: chan_dahdi.c:3099
    my_handle_dchan_exception: PRI got event: HDLC Abort (6) on D-channel of span 1

    Sometimes, a line like the one bellow slides between the above ones.
    [2014-02-03 13:50:24] NOTICE[2076]: chan_dahdi.c:3099
    my_handle_dchan_exception: PRI got event: HDLC Bad FCS (8) on D-channel of span 1

    Thoughts ?

    2014-01-15 Olivier :

  • I would suggest updating DAHDI. We moved from svn to git years ago and that repository is not maintained.

    This is fine. I would only worry about sharing irqs with hardware that would compete for interrupt time, like a network controller or disk controller.

    3 slips a minute is too high. You’ll likely hear audio artifacts like clicks or pops if you were listening to audio on the line. This may give you a bit more info on acceptable slip rates. http://www.youtube.com/watch?v=2UNfzEuv1uk#t78

    HDLC aborts are a symptom of timing slips, typically.

    If you’re legitimately recovering timing from the network properly, then I would look at what the upstream provider is doing. Your provider should be able to run long term loop/pattern tests to your smartjack to determine link quality issues.