Surfing The Web Via Asterisk.
Has anyone attempted making the web phone accessible? I can only find one company which operated between 1996 and 2000.
I was thinking, install Chrome with Chromevox, headless, on a server, and use something like an AGI to send basic keyboard commands to navigate a page, as a screenreader user would, and pipe the audio back to a channel, to be streamed by Asterisk.
(Bear with me here – it’s a project for blind people involving a telephone and some lateral thinking!)
And yes, I mean more than just CURL a page, tts it and then read. I’m talking about using the keypad to navigate the headers and landmarks. There are just enough keys to make it viable.
Of particular interest is the very high quality of the Chromevox screenreader voice from Google.
Does such a framework exist? I’m aware of a project called Chromium Headless, but some of the links are broken and it doesn’t seem to have the audio/extension part
Failing headless, what about running it on a vps in an x-window environment? Not sure how I’d pass the keyboard presses to it, without using a keyboard…
Any ideas, or is the whole idea complete madness? Thanks!
5 thoughts on - Surfing The Web Via Asterisk.
This is a really interesting project but I think it’s going to be seriously hard. You’re going to need to parse meaning from a site, and that’s not an easy thing to do.
If you’re focused on a few of the bigger sites then it might be easier.
You almost want a middle layer that can parse meaning from a site into xml or something.
Then I’d work on creating objects out of each kind of tag. The problem is that navigation may not be the same as you’d see when visiting a site. You’re not really going to be moving left and right. It would be more like tab works. Next item kinda thing. And items wouldn’t necessarily be in the same order as you see. Pull right/left classes for bootstrap etc would make layout different.
I would maybe check if there are any libraries that can parse HTML into objects first and if not then start building on yourself. The telephony side will be easy. You’d use agi or something to navigate the object you create and tts to describe current position. The hard part will be parsing the HTML even though most HTML is broken 🙂
Kind regards,
Matt Riddell
Ah, no, you misunderstand. Asterisk wouldn’t care one little bit what is on the page – Chromevox would do all that. A screenreader usually tabs or arrows their way about, selecting headings to read content.
Thus, Asterisk ONLY needs to be able to hear content FROM the browser and pipe it to the channel, and pass keypresses back TO the browser.
The human is the parser, if that makes sense?
Right, so you’re using a prebuilt browser to do the parsing.
You’d really want to see if you can get ChromeVox as a library rather than as a browser though – otherwise you’re going to be limiting yourself to using one concurrent channel and hacks like jack audio to move the audio from the browser to the channel.
I’m guessing you’re going to be wanting something closer to this:
https://www.npmjs.com/package/speech-rule-engine <https://www.npmjs.com/package/speech-rule-engine>
Thanks. That and the tip about jackaudio look interesting, although that thing above is just a parser, not a renderer.
I think, at this stage, it’s an idea to go back in the box for another day. I was hoping someone might say “sure, there’s the AGI that you just drop right in”.
Or at least, along those lines! Thanks for the pointers anyway.