Simple Speech Recognition For Driving IVR – “press Or Say One”.

Home » Asterisk Users » Simple Speech Recognition For Driving IVR – “press Or Say One”.
Asterisk Users 9 Comments

Briefly: I want to be able to have “press or say (number)”, with Asterisk listening for a spoken number, but accepting a DTMF digit, too.

I’m posting everything I found so far, here, partly to show working, but also in case anyone else finds it useful. So, moving on….

This looked hopeful for a moment until I realised that it doesn’t do DTMF:
https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+Application_SpeechBackground

So then there’s https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+Application_Record, which can terminate on any DTMF key with “y”, but according to the docs, “RECORD_STATUS” only sets a flag of “DTMF” (A terminating DTMF
was received (‘#’ or ‘*’, depending upon option ‘t’)). So, I don’t get to know which key was pressed via that method, either.

There’s very little information I can find about the built-in functions for speech recognition. https://wiki.asterisk.org/wiki/display/AST/Speech+Recognition+API
doesn’t actually explain how to integrate the actual speech engines.

In this previous forum post, https://community.asterisk.org/t/asterisk-15-jack-streams-speech-recognition-so-many-questions/72108/2
, jcolp explained that most people don’t use the speech interface anyway, because
“Asterisk modules are written in C, and it’s more difficult to do things in that fashion. Using the Record and ship it off using Python, etc, is just easier and gets the job done for a lot of people to where they find it acceptable. So, AGI it is! But I’m still stuck on how I record for speech AND get a DTMF if it was dialled.

Regarding speech in general, even “Asterisk – The Definitive Guide” just says:

“Asterisk does not have speech recognition built in, but there are many third-party speech recognition packages that integrate with Asterisk. Much of that is outside of the scope of this book, as those applications are external to Asterisk” – helpful!

The speech-rec mailing list at http://lists.digium.com/pipermail/asterisk-speech-rec/ hasn’t been posted to since 2013

Someone else asked about speech recognition and unimrcp in this post:
http://lists.digium.com/pipermail/asterisk-users/2017-February/290875.html

uniMCRP https://mojolingo.com/blog/2015/speech-rec-asterisk-get-started/
http://www.unimrcp.org/manuals/html/AsteriskManual.html#_Toc424230605
This has a Google Speech Recogniser plugin, but it’s $50 per channel http://www.unimrcp.org/gsr

*Reasons to use Lex over Google TTS*
• Has just been released in eu-west-1:
https://forums.aws.amazon.com/ann.jspa?annID=5186
• Supports 8KHz telepony https://forums.aws.amazon.com/ann.jspa?annID=4775
• Is in the core AWS SDK
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/LexRuntime.html
• Has a number slot type:
http://docs.aws.amazon.com/lex/latest/dg/built-in-slot-number.html
– this means no accidental recognition of “won”, “one” or “juan” instead of 1!

The pricing is definitely right: “The cost for 1,000 speech requests would be $4.00, and 1,000 text requests would cost $0.75. From the date you get started with Amazon Lex, you can process up to 10,000
text requests and 5,000 speech requests per month for free for the first year”.

Amazon Transcribe looks promising too, but is only available for developer invitation at this time:
https://aws.amazon.com/transcribe/ https://aws.amazon.com/transcribe/pricing/

But all I need now is the quickest, simplest way to send Lex a short
8KHz file and get a single digit back, as quickly and reliably as possible.

Before I travel too far down this road, can someone point me in the right direction and possibly steer me away from the wrong path?!

9 thoughts on - Simple Speech Recognition For Driving IVR – “press Or Say One”.

  • Thanks Jurijs,

    Yes, in fact I’m already using that, and it works fine. The problem here is that I cannot find a way of recording speech AND listening for a DTMF digit being pressed as an alternative.

    That’s where the problem lies.

    J.

  • Hi,

    Please check code of it. It listens for # and it is quite easy to add all other keys 1-9 and etc….

    Then change code accordingly so script returns value of key.

    As far as I remember it wasn’t hard.

    With kind regards,

    Jurijs

  • Hello,

    Maybe you can do this by mixing your current code with an ARI
    application. I mean :

    – just before entering the speech recognition AGI, enter your ARI
    application

    – in the application, subscribe to the channel’s events, setup DTMF
    event handlers, and call “continueInDialpan”

    – then enter the speech recognition AGI as before

    Regards

    Jean Aunis

    Le 06/12/2017 à 15:50, Jonathan H a écrit :

  • VW5pTVJDUCB3aXRoIG9uZSBvZiB0aGUgdmFyaW91cyBzcGVlY2ggcmVjb2duaXRpb24gcHJvdmlk ZXJzIHRoZXkgc3VwcG9ydCBkZWZpbml0ZWx5IHdvcmtzIGZvciB0aGlzLg0KU3BlY2lmeSBtdWx0
    aXBsZSBncmFtbWFycyBpbiB0aGUgTVJDUCBjYWxsLiAgT25lIGZvciB0ZXh0IHRvIGxpc3RlbiBm b3IuICBBbm90aGVyIGZvciB0aGUgRFRNRnMgdG8gbGlzdGVuIGZvci4NClRoZSByZXN1bHRzIHdp bGwgaW5kaWNhdGUgd2hpY2ggZ3JhbW1hciBhbmQgd2hhdCB3YXMgZGV0ZWN0ZWQuDQoNClRoZSBj b21iaW5hdGlvbiBvZiB2b2ljZSBhbmQvb3IgRFRNRnMgaXMgZXhhY3RseSB3aGF0IHNwZWVjaCBy ZWNvZ25pdGlvbiBoYXMgYmVlbiBkZXNpZ25lZCBmb3IuICBJIGFtIHZlcnkgcGxlYXNlZCB3aXRo IFVuaU1SQ1AgYW5kIHRoZSBzdXBwb3J0IHRoZXkgaGF2ZSBnaXZlbiB1cy4NCg0KDQpGcm9tOiBh c3Rlcmlzay11c2Vycy1ib3VuY2VzQGxpc3RzLmRpZ2l1bS5jb20gW21haWx0bzphc3Rlcmlzay11
    c2Vycy1ib3VuY2VzQGxpc3RzLmRpZ2l1bS5jb21dIE9uIEJlaGFsZiBPZiBKdXJpanMgSXZvbGdh DQpTZW50OiBXZWRuZXNkYXksIERlY2VtYmVyIDA2LCAyMDE3IDg6NDQgQU0NClRvOiBBc3Rlcmlz ayBVc2VycyBNYWlsaW5nIExpc3QgLSBOb24tQ29tbWVyY2lhbCBEaXNjdXNzaW9uDQpTdWJqZWN0
    OiBSZTogW2FzdGVyaXNrLXVzZXJzXSBTaW1wbGUgc3BlZWNoIHJlY29nbml0aW9uIGZvciBkcml2
    aW5nIElWUiAtICJwcmVzcyBvciBzYXkgb25lIi4NCg0KSGksDQpJIHdhcyBhYmxlIHRvIGFjaGll dmUgdGhpcyB1c2luZzoNCg0KSnVyaWpzDQoNCk9uIFdlZCwgRGVjIDYsIDIwMTcgYXQgNDozMyBQ
    TSwgSm9uYXRoYW4gSCA8bGFyZGNvbmNlcHRzQGdtYWlsLmNvbTxtYWlsdG86bGFyZGNvbmNlcHRz QGdtYWlsLmNvbT4+IHdyb3RlOg0KQnJpZWZseTogSSB3YW50IHRvIGJlIGFibGUgdG8gaGF2ZSAi cHJlc3Mgb3Igc2F5IChudW1iZXIpIiwgd2l0aA0KQXN0ZXJpc2sgbGlzdGVuaW5nIGZvciBhIHNw b2tlbiBudW1iZXIsIGJ1dCBhY2NlcHRpbmcgYSBEVE1GIGRpZ2l0LA0KdG9vLg0KDQpJJ20gcG9z dGluZyBldmVyeXRoaW5nIEkgZm91bmQgc28gZmFyLCBoZXJlLCBwYXJ0bHkgdG8gc2hvdyB3b3Jr aW5nLA0KYnV0IGFsc28gaW4gY2FzZSBhbnlvbmUgZWxzZSBmaW5kcyBpdCB1c2VmdWwuIFNvLCBt b3Zpbmcgb24uLi4uDQoNClRoaXMgbG9va2VkIGhvcGVmdWwgZm9yIGEgbW9tZW50IHVudGlsIEkg cmVhbGlzZWQgdGhhdCBpdCBkb2Vzbid0IGRvIERUTUY6DQpodHRwczovL3dpa2kuYXN0ZXJpc2su b3JnL3dpa2kvZGlzcGxheS9BU1QvQXN0ZXJpc2srMTUrQXBwbGljYXRpb25fU3BlZWNoQmFja2dy b3VuZA0KDQpTbyB0aGVuIHRoZXJlJ3MNCmh0dHBzOi8vd2lraS5hc3Rlcmlzay5vcmcvd2lraS9k aXNwbGF5L0FTVC9Bc3RlcmlzaysxNStBcHBsaWNhdGlvbl9SZWNvcmQsDQp3aGljaCBjYW4gdGVy bWluYXRlIG9uIGFueSBEVE1GIGtleSB3aXRoICJ5IiwgYnV0IGFjY29yZGluZyB0byB0aGUNCmRv Y3MsICJSRUNPUkRfU1RBVFVTIiBvbmx5IHNldHMgYSBmbGFnIG9mICJEVE1GIiAoQSB0ZXJtaW5h dGluZyBEVE1GDQp3YXMgcmVjZWl2ZWQgKCcjJyBvciAnKicsIGRlcGVuZGluZyB1cG9uIG9wdGlv biAndCcpKS4NClNvLCBJIGRvbid0IGdldCB0byBrbm93IHdoaWNoIGtleSB3YXMgcHJlc3NlZCB2
    aWEgdGhhdCBtZXRob2QsIGVpdGhlci4NCg0KVGhlcmUncyB2ZXJ5IGxpdHRsZSBpbmZvcm1hdGlv biBJIGNhbiBmaW5kIGFib3V0IHRoZSBidWlsdC1pbg0KZnVuY3Rpb25zIGZvciBzcGVlY2ggcmVj b2duaXRpb24uDQpodHRwczovL3dpa2kuYXN0ZXJpc2sub3JnL3dpa2kvZGlzcGxheS9BU1QvU3Bl ZWNoK1JlY29nbml0aW9uK0FQSQ0KZG9lc24ndCBhY3R1YWxseSBleHBsYWluIGhvdyB0byBpbnRl Z3JhdGUgdGhlIGFjdHVhbCBzcGVlY2ggZW5naW5lcy4NCg0KSW4gdGhpcyBwcmV2aW91cyBmb3J1
    bSBwb3N0LA0KaHR0cHM6Ly9jb21tdW5pdHkuYXN0ZXJpc2sub3JnL3QvYXN0ZXJpc2stMTUtamFj ay1zdHJlYW1zLXNwZWVjaC1yZWNvZ25pdGlvbi1zby1tYW55LXF1ZXN0aW9ucy83MjEwOC8yDQos IGpjb2xwIGV4cGxhaW5lZCB0aGF0IG1vc3QgcGVvcGxlIGRvbid0IHVzZSB0aGUgc3BlZWNoIGlu dGVyZmFjZQ0KYW55d2F5LCBiZWNhdXNlDQoiQXN0ZXJpc2sgbW9kdWxlcyBhcmUgd3JpdHRlbiBp biBDLCBhbmQgaXTigJlzIG1vcmUgZGlmZmljdWx0IHRvIGRvDQp0aGluZ3MgaW4gdGhhdCBmYXNo aW9uLiBVc2luZyB0aGUgUmVjb3JkIGFuZCBzaGlwIGl0IG9mZiB1c2luZyBQeXRob24sDQpldGMs IGlzIGp1c3QgZWFzaWVyIGFuZCBnZXRzIHRoZSBqb2IgZG9uZSBmb3IgYSBsb3Qgb2YgcGVvcGxl IHRvIHdoZXJlDQp0aGV5IGZpbmQgaXQgYWNjZXB0YWJsZS4NClNvLCBBR0kgaXQgaXMhIEJ1dCBJ
    J20gc3RpbGwgc3R1Y2sgb24gaG93IEkgcmVjb3JkIGZvciBzcGVlY2ggQU5EIGdldA0KYSBEVE1G
    IGlmIGl0IHdhcyBkaWFsbGVkLg0KDQpSZWdhcmRpbmcgc3BlZWNoIGluIGdlbmVyYWwsIGV2ZW4g IkFzdGVyaXNrIC0gVGhlIERlZmluaXRpdmUgR3VpZGUiIGp1c3Qgc2F5czoNCg0KIkFzdGVyaXNr IGRvZXMgbm90IGhhdmUgc3BlZWNoIHJlY29nbml0aW9uIGJ1aWx0IGluLCBidXQgdGhlcmUgYXJl DQptYW55IHRoaXJkLXBhcnR5IHNwZWVjaA0KcmVjb2duaXRpb24gcGFja2FnZXMgdGhhdCBpbnRl Z3JhdGUgd2l0aCBBc3Rlcmlzay4gTXVjaCBvZiB0aGF0IGlzDQpvdXRzaWRlIG9mIHRoZSBzY29w ZQ0Kb2YgdGhpcyBib29rLCBhcyB0aG9zZSBhcHBsaWNhdGlvbnMgYXJlIGV4dGVybmFsIHRvIEFz dGVyaXNrIiAtIGhlbHBmdWwhDQoNClRoZSBzcGVlY2gtcmVjIG1haWxpbmcgbGlzdCBhdA0KaHR0
    cDovL2xpc3RzLmRpZ2l1bS5jb20vcGlwZXJtYWlsL2FzdGVyaXNrLXNwZWVjaC1yZWMvIGhhc24n dCBiZWVuDQpwb3N0ZWQgdG8gc2luY2UgMjAxMw0KDQpTb21lb25lIGVsc2UgYXNrZWQgYWJvdXQg c3BlZWNoIHJlY29nbml0aW9uIGFuZCB1bmltcmNwIGluIHRoaXMgcG9zdDoNCmh0dHA6Ly9saXN0
    cy5kaWdpdW0uY29tL3BpcGVybWFpbC9hc3Rlcmlzay11c2Vycy8yMDE3LUZlYnJ1YXJ5LzI5MDg3
    NS5odG1sDQoNCnVuaU1DUlAgaHR0cHM6Ly9tb2pvbGluZ28uY29tL2Jsb2cvMjAxNS9zcGVlY2gt cmVjLWFzdGVyaXNrLWdldC1zdGFydGVkLw0KaHR0cDovL3d3dy51bmltcmNwLm9yZy9tYW51YWxz L2h0bWwvQXN0ZXJpc2tNYW51YWwuaHRtbCNfVG9jNDI0MjMwNjA1DQpUaGlzIGhhcyBhIEdvb2ds ZSBTcGVlY2ggUmVjb2duaXNlciBwbHVnaW4sIGJ1dCBpdCdzICQ1MCBwZXIgY2hhbm5lbA0KaHR0
    cDovL3d3dy51bmltcmNwLm9yZy9nc3INCg0KKlJlYXNvbnMgdG8gdXNlIExleCBvdmVyIEdvb2ds ZSBUVFMqDQrigKIgSGFzIGp1c3QgYmVlbiByZWxlYXNlZCBpbiBldS13ZXN0LTE6DQpodHRwczov L2ZvcnVtcy5hd3MuYW1hem9uLmNvbS9hbm4uanNwYT9hbm5JRD01MTg2DQrigKIgU3VwcG9ydHMg OEtIeiB0ZWxlcG9ueSBodHRwczovL2ZvcnVtcy5hd3MuYW1hem9uLmNvbS9hbm4uanNwYT9hbm5J
    RD00Nzc1DQrigKIgSXMgaW4gdGhlIGNvcmUgQVdTIFNESw0KaHR0cDovL2RvY3MuYXdzLmFtYXpv bi5jb20vQVdTSmF2YVNjcmlwdFNESy9sYXRlc3QvQVdTL0xleFJ1bnRpbWUuaHRtbA0K4oCiIEhh cyBhIG51bWJlciBzbG90IHR5cGU6DQpodHRwOi8vZG9jcy5hd3MuYW1hem9uLmNvbS9sZXgvbGF0
    ZXN0L2RnL2J1aWx0LWluLXNsb3QtbnVtYmVyLmh0bWwNCiAtIHRoaXMgbWVhbnMgbm8gYWNjaWRl bnRhbCByZWNvZ25pdGlvbiBvZiAid29uIiwgIm9uZSIgb3IgImp1YW4iIGluc3RlYWQgb2YgMSEN
    Cg0KVGhlIHByaWNpbmcgaXMgZGVmaW5pdGVseSByaWdodDogIlRoZSBjb3N0IGZvciAxLDAwMCBz cGVlY2ggcmVxdWVzdHMNCndvdWxkIGJlICQ0LjAwLCBhbmQgMSwwMDAgdGV4dCByZXF1ZXN0cyB3
    b3VsZCBjb3N0ICQwLjc1LiBGcm9tIHRoZQ0KZGF0ZSB5b3UgZ2V0IHN0YXJ0ZWQgd2l0aCBBbWF6
    b24gTGV4LCB5b3UgY2FuIHByb2Nlc3MgdXAgdG8gMTAsMDAwDQp0ZXh0IHJlcXVlc3RzIGFuZCA1
    LDAwMCBzcGVlY2ggcmVxdWVzdHMgcGVyIG1vbnRoIGZvciBmcmVlIGZvciB0aGUNCmZpcnN0IHll YXIiLg0KDQpBbWF6b24gVHJhbnNjcmliZSBsb29rcyBwcm9taXNpbmcgdG9vLCBidXQgaXMgb25s eSBhdmFpbGFibGUgZm9yDQpkZXZlbG9wZXIgaW52aXRhdGlvbiBhdCB0aGlzIHRpbWU6DQpodHRw czovL2F3cy5hbWF6b24uY29tL3RyYW5zY3JpYmUvIGh0dHBzOi8vYXdzLmFtYXpvbi5jb20vdHJh bnNjcmliZS9wcmljaW5nLw0KDQpCdXQgYWxsIEkgbmVlZCBub3cgaXMgdGhlIHF1aWNrZXN0LCBz aW1wbGVzdCB3YXkgdG8gc2VuZCBMZXggYSBzaG9ydA0KOEtIeiBmaWxlIGFuZCBnZXQgYSBzaW5n bGUgZGlnaXQgYmFjaywgYXMgcXVpY2tseSBhbmQgcmVsaWFibHkgYXMNCnBvc3NpYmxlLg0KDQpC
    ZWZvcmUgSSB0cmF2ZWwgdG9vIGZhciBkb3duIHRoaXMgcm9hZCwgY2FuIHNvbWVvbmUgcG9pbnQg bWUgaW4gdGhlDQpyaWdodCBkaXJlY3Rpb24gYW5kIHBvc3NpYmx5IHN0ZWVyIG1lIGF3YXkgZnJv bSB0aGUgd3JvbmcgcGF0aD8hDQoNCi0tDQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCi0tIEJhbmR3aWR0aCBhbmQg Q29sb2NhdGlvbiBQcm92aWRlZCBieSBodHRwOi8vd3d3LmFwaS1kaWdpdGFsLmNvbSAtLQ0KDQpD
    aGVjayBvdXQgdGhlIG5ldyBBc3RlcmlzayBjb21tdW5pdHkgZm9ydW0gYXQ6IGh0dHBzOi8vY29t bXVuaXR5LmFzdGVyaXNrLm9yZy8NCg0KTmV3IHRvIEFzdGVyaXNrPyBTdGFydCBoZXJlOg0KICAg ICAgaHR0cHM6Ly93aWtpLmFzdGVyaXNrLm9yZy93aWtpL2Rpc3BsYXkvQVNUL0dldHRpbmcrU3Rh cnRlZA0KDQphc3Rlcmlzay11c2VycyBtYWlsaW5nIGxpc3QNClRvIFVOU1VCU0NSSUJFIG9yIHVw ZGF0ZSBvcHRpb25zIHZpc2l0Og0KICAgaHR0cDovL2xpc3RzLmRpZ2l1bS5jb20vbWFpbG1hbi9s aXN0aW5mby9hc3Rlcmlzay11c2Vycw0KDQo

  • Thanks for your responses – it looks like I have the following options, in order of ease:

    1: Modify and recompile app_record.c

    Change line 471
    https://github.com/asterisk/asterisk/blob/master/apps/app_record.c#L471
    from
    status_response = “DTMF”;
    to
    status_response = dtmf_integer;

    Pro: Free, easy Con: Have to remember to edit module each time a new Asterisk update comes out

    2: Use the Jean Aunis “mix ARI and AGI” trick. Pro: Doesn’t need recompiling on each Asterisk release. Con: A bit of fiddling and requires an ARI library.

    3: Pay $50 for uniMRCP module Pro: Does what I need to do Con: $50 per channel. Requires account. Lots of setup to basically add DTMF to the speech recognition I’m already doing.

    Yes? No? None of the above? Other?!

  • Hang on, all of the fiddling in this thread seems remarkably over-complicating what should be an incredibly simple task.

    We know that a DTMF keypress interrupted the recording. We also know that app_record.c knows which keypress it was from

    * \param dtmf_integer the integer value of the DTMF key received

    as in

    static enum dtmf_response record_dtmf_response(struct ast_channel
    *chan, struct ast_flags *flags, int dtmf_integer, int terminator)

    For reasons which have me scratching my head, app_record turns a useful DTMF value into a rather meaningless “DTMF” in the RECORD_STATUS variable.

    But SOMETHING must be floating around in Asterisk for app_record.c to know what number was pushed. If I’m using RFC2833, is there ANY way of getting that last keypress.

    In other words: “The user pressed a number, recording stopped, now what was that number?” – WITHOUT rewriting and recompiling a core application or doing any complex workaround?

    Thanks

  • When originally added it was only possible to terminate based on a termination DTMF, so you’d know which DTMF key was used because no other DTMF would stop. Afterwards a community member contributed a change[1]
    to add an option to allow any DTMF key to terminate it, but the dialplan variable stuff was not extended to make the knowledge of which DTMF was used available.

    Within the code f->subclass.integer is where the DTMF digit is. You’d need to make a code change to set another dialplan variable which contains it.

    [1] https://issues.asterisk.org/jira/browse/ASTERISK-14380