DAHDI for Meetme on AMD64 arch

Home » Asterisk Users » DAHDI for Meetme on AMD64 arch
Asterisk Users 14 Comments

I’m in the process of replacing an old server with a new one and are making som changes in the infrastructure, the biggest change in my eyes is moving from i386 to AMD64 Arch. Yesterday I began migrating some users from the old to the new server.

After only 57 concurrent calls in abount 13 conferences the sound are losing quality.
The server uses DAHDI 2.6.0 for timing but no DAHDI hardware.

dahdi_test gives results like this when the server is used like that:
100.000% 99.999% 99.994% 99.998% 99.999% 99.616% 99.614% 99.997% 99.998% 99.618% 99.615% 99.994% 99.987% 99.626% 99.628% 99.993%
99.626% 100.000% 100.000% 99.622% 99.999% 99.607% 99.604% 99.627% 99.621% 99.629% 99.627% 99.998% 99.622% 99.995% 99.621% 99.996%

Results from dahdi_test with only some calls active:
99.999% 99.999% 99.990% 99.998% 99.999% 99.995% 99.995% 99.993% 99.997% 99.993% 99.999% 99.998% 99.996% 99.996% 99.998% 99.998%
99.991% 99.998% 99.995% 99.995% 99.987% 99.985% 99.996% 99.995%

Looking at the cacti graphs the kernel uses 100% cpu (total 400% with 4 processor cores), when the problem above is present. Top does not show this kernel-cpu that cacti shows, but this maybe is by design? Asterisk is using about 15% cpu.

top – 19:32:06 up 20:57, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 213 total, 1 running, 212 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.4%us, 29.6%sy, 0.0%ni, 55.3%id, 0.0%wa, 0.0%hi, 7.7%si, 0.0%st
Mem: 12299332k total, 3967800k used, 8331532k free, 251432k buffers
Swap: 19529720k total, 0k used, 19529720k free, 2919456k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30666 root 0 -20 539m 25m 6600 S 15 0.2 6:55.01 asterisk
738 root 20 0 19184 1444 1004 R 1 0.0 0:00.08 top

The old server (i386 Debian 5: Linux 2.6.26-2-openvz-686) can have 320 calls in conferences without this problem.
The new server (amd64 Debian 6: Linux 2.6.32-5-openvz-amd64) show these problems after 50 calls..

Old server:
Hp dl360g5, 4 cpu Xeon E5420, 2.50GHz run i386 with PAE and OpenVZ, Debian Lenny uses the broadcom nic’s on the motherboard asterisk 1.4.42 in openvz container (uses /dev/dahdi for timing) cacti shows cpu in kernel mode 80% with 320 active calls in conferences

New server:
Hp dl360g7, 4 cpu Xeon E5520, 2.27GHz
run amd63 with OpenVZ, Debian Squeeze uses Intel nic’s 82571EB for offloading the processor + nic bonding in the kernel for failover. asterisk 1.4.42 in openvz container (uses  /dev/dahdi for timing) cacti show cpu in kernel mod 100% with 57 active calls in conferences

This is a puzzle to me..
– Does anyone have experience with amd64 arch and dahdi for timing?
– Can Dahdi om amd64 be responsible for the high cpu in kernel mode?

– I have a spare Digium TE220, would it offload the server to use it as a timing source only?
– How do I debug the high cpu usage by the kernel, can I break this down by module in some way?

Many, many thanks!

 

14 thoughts on - DAHDI for Meetme on AMD64 arch

  • Hi Johan,

    I’ve run into a similar issue before. I didn’t resolve the problem per
    se, but I got around it by modifying modules.conf to disable the loading
    of res_timing_timerfd.so and loaded res_timing_dahdi.so instead:

    noload => res_timing_timerfd.so
    load => res_timing_dahdi.so

    Cpu load came back down and call quality has been excellent since.
    Perhaps this might work for you?

  • 2012-01-18 11:31, John Knight skrev:

    Hi!

    I think the timing support was included in asterisk in 1.6.1/1.6.2.
    As I run 1.4 these modules are not available at all.

    Do you run asterisk >1.6 and amd64?

    Another option would be to port my dialplan to a newer version of
    asterisk if this can resolve the issue.

    A workaround I’ve been tinking about is to try to put a spare
    Digium-card in the server just for timing, if there is something strange
    with the soft dahdi timing.

    I’m not very fond of the idea to rebuild everything on i386
    architecture, but that’s the last resort.

    /Johan

  • Ah, apologies, I just re-read your given Asterisk version. Indeed, I
    was using 1.8.5.0 at the time, not any 1.4.x release.

    Any digium timing card will work as an OpenVZ compatible dahdi timing
    device, I’ve seen this work on both Virtuozzo and OpenVZ. Setting it
    up, there’s no difference in how you set up passthrough access using
    DEVNODES to the device from /dev inside the $CTID.conf file. Just make
    sure permissions inside the container make it writable by the asterisk user.

  • I would be interested to learn if there was any problem with soft
    (coretimer) when DAHDI is running on your ne platform. I would not
    expect that.

    One question first though, is your new server able to keep accurate
    time with nt, or is the clock drifting or experiencing heavy jitter?

  • 2012-01-18 16:45, John Knight skrev:
    Okay, I will try that procedure tonight.
    I’ll also remove my Intel dual nic card, and the network bonds.

    After that, then only difference to the machine working and the machine
    not working are i386 / amd6.
    And the os version – debian 5 / 6.

    Have you used 64 bit kernels (amd64) in your setup? Distribution?

    Thanks for your advices, it’s very appreciated!

    /Johan

  • 2012-01-18 17:50, Shaun Ruffell skrev:
    The clock is accurate by ntp sync. It uses the vanilla debian config you
    get if you “apt-get install ntp”.
    Was it nt(p) you meant above? The clock drifts a lot if it is not synced
    by ntp. I’ve noticed most of my hp 360/380 servers to drift up to 10
    minutes per week, including this server. But ntp fixes this right?

    If you have ideas how to debug this I would be very grateful.

  • “Have you used 64 bit kernels (amd64) in your setup? Distribution?”

    Aye, I use the current stable 64-bit rhel6 branch openvz kernel with
    centos 6 on the node and scientific linux 6 in the template without
    issue other than what I described before with res_timing_timerfd.so
    pegging the cpu and coring Asterisk.

    It’s never a suggestion a debian user wants to hear, but as the vanilla
    2.6.32 openvz kernel has effectively been abandoned by the OpenVZ dev
    team in favor of the rhel6 version of 2.6.32, and since the node
    shouldn’t really be doing anything other than hosting the templates,
    have you considered running centos6/rhel6-openvz kernel on the node and
    debian in the containers? Just a suggestion, but no further openvz
    development is being done to the vanilla 2.6.32 branch and the rhel6
    openvz kernel will consistently have bug fixes and and backports.

    Not trying to start a distro war or anything, rather just a suggestion.

  • That’s pretty severe, and could certainly cause problems for DAHDI
    trying to use the kernel as a timing source. NTP will correct the drift,
    but the drift is still happening and it’s not corrected on every tick.
    If the ticks are not happening at the rate they are supposed to, then
    DAHDI will not be operating at the clock rate it is supposed to.

  • Kevin, this looks like a good candidate for using the “monotonic”
    interface in the kernel that we were talking about last week or the
    week before. The specific function call escapes me at the moment.

    Johan, I can’t do it right this second, but I’ll prepare an issue /
    patch against a 2.6.32 kernel that should make dahdi less prone to
    clock skew from NTP (although you probably want to get that fixed
    somehow) if you would be willing to test it for me on your server.

    Another thing you can try in the meantime is switch to DAHDI 2.5.0.2
    and edit drivers/dahdi/Kbuild to enable dahdi_dummy which will use
    the (relatively inefficient for the purposes of conferencing)
    highres timers when loaded by default on recent kernels (if
    compiled in).

    Cheers,
    Shaun

  • 2012-01-18 20:06, Shaun Ruffell skrev:
    Didn’t think of that. I’ve turnd of ntpd now to see exactly how much the
    clock skew when ntpd is not running.

    root@milkyway:/home/johan# ntptime
    ntp_gettime() returns code 0 (OK)
    time d2c1ac73.45214000 Wed, Jan 18 2012 21:39:15.270, (.270039),
    maximum error 167603 us, estimated error 815 us
    ntp_adjtime() returns code 0 (OK)
    modes 0x0 (),
    offset 522.000 us, frequency 124.445 ppm, interval 1 s,
    maximum error 167603 us, estimated error 815 us,
    status 0x1 (PLL),
    time constant 6, precision 1.000 us, tolerance 500 ppm,

    This is the server running dahdi_test at the same time:

    root@milkyway:/home/johan# dahdi_test
    Opened pseudo dahdi interface, measuring accuracy…
    99.602% 99.991% 99.614% 99.983% 99.608% 99.999% 99.611% 100.000%
    99.999% 99.998% 99.999% 99.995% 99.609% 99.613% 99.997% 99.608%
    99.611% 99.999% 99.608% 99.610% 99.998% 99.999% 99.999% 99.995%
    99.987% 99.999% 99.999% 99.999% 99.999% 99.996% 99.999% 99.999%
    99.995% 99.998% 99.999% 99.999% 99.999% 99.998% 99.999% 99.992%
    99.994% 99.999% 99.989% 99.999% 99.998% 99.998% 99.996% 99.998%
    99.998% 99.983% 99.999% 99.998% 99.999% 99.992% 99.997% 99.997%
    99.982% 99.979% 99.986% 99.993% 99.999% 99.999% 99.999% 99.995%
    99.999% 99.997% 99.993% 99.995% 99.998% 99.998% 99.999% 99.998%
    99.998% 99.998% 99.999% 99.999% 99.999% ^C

  • Slightly OT: If you re-enable dahdi_dummy the same way on 2.6, it will
    oops on load as it misses a parent device.

  • In article <4F168FCC.9070300@jttech.se>, Johan Wilfer wrote:

    It may be a stupid question just displaying ignorance on my part, but
    why are you using *AMD*64 architecture on an *Intel* processor?
    Surely for 64-bit, you should be using x86_64 architecture instead?

    Cheers
    Tony

  • Tony Mountifield wrote:

    From what I’ve read, AMD came out with the extended instruction set and
    Intel just implemented it as is.

    It’s basically the same for both processors.

    Doug