Siri, Cortana and Google Have Nothing On SoundHound's Speech Recognition 235
MojoKid writes: Your digital voice assistant app is incompetent. Yes, Siri can give you a list of Italian restaurants in the area, Cortana will happily look up the weather, and Google Now will send a text message, if you ask it to. But compared to Hound, the newest voice search app on the block, all three of the aforementioned assistants might as well be bumbling idiots trying to outwit a fast talking rocket scientist. At its core, Hound is the same type of app — you bark commands or ask questions about any number of topics and it responds intelligently. And quickly. What's different about Hound compared to Siri, Cortana, and Google Now is that it's freakishly fast and understands complex queries that would have the others hunched in the fetal position, thumb in mouth. Check out the demo. It's pretty impressive.
Yes, but can it launch Waze (Score:4, Interesting)
Or does it just stare at you stupidly because using ways to give you directions means nothing if it doesn't recognize the homophone.
Re: (Score:2, Funny)
"What is the population of capital of the country in which Space Needle is located?"
Hound correctly surmises that he's asking for the population of Washington, DC...
The Space Needle is in Seattle.
Re:Yes, but can it launch Waze (Score:5, Informative)
population of capital of the country
And Washington DC is the capital of the United States, the country where the Space Needle is located.
Re:Yes, but can it launch Waze (Score:5, Funny)
What we have just learned is that SoundHound has better comprehension than some Slashdot commenters :)
Re:Yes, but can it launch Waze (Score:4)
mea culpa
Re: (Score:3)
Panic now, while there is still time!
OTOH people dumber than computer ... nothing to see here. Move along now, please.
Re: (Score:3)
Panic now, while there is still time!
Screw that. I want the computer to panic for me.
Oh, wait. Systemd...
Re: (Score:3)
Did I just witness the passing of the Turing Test?! ;-)
Re: (Score:2)
No.
Re: (Score:2)
We've all been there and done that. Welcome to club of Making Yourself Look Like An Idiot. Stay a while. There'll be cookies later.
Re: (Score:2)
Seattle isn't the capital of the country in which the Space Needle is located.
Re: Yes, but can it launch Waze (Score:2)
Re: (Score:2)
"What is the population of capital of the country in which Space Needle is located?"
Hound correctly surmises that he's asking for the population of Washington, DC...
The Space Needle is in Seattle.
Correct. Which is in the US, the country with D.C. as its capital.
Read the question again...
Re: (Score:2)
I think you missed the word "country" in the question.
Re:Yes, but can it launch Waze (Score:5, Funny)
It's not an empire, because it doesn't have an emperor.
It's not a kingdom, because it doesn't have a king.
It's not a principality, because it doesn't have a prince.
So it must be a country, because it has plenty of
Re: (Score:2)
"What is the population of capital of the country in which Space Needle is located?"
Hound correctly surmises that he's asking for the population of Washington, DC...
The Space Needle is in Seattle.
Yes, and Seattle is in Washington State which is part of a country called the United States, for which the capital is Washington, DC.
It's a poorly worded question, but apparently one this app can parse more readily than some humans ;-)
Re: (Score:3)
It's a poorly worded question, but apparently one this app can parse more readily than some humans ;-)
I think the idea is that it's not so much *poorly* worded, as *carefully* worded to be deliberately obtuse.
Re: (Score:2)
Re: (Score:2)
And what does that have to do with the question being asked?
Here's a quick reminder for you:
"What is the population of capital of the country in which Space Needle is located?"
Re: (Score:2)
Because of actress Olympia Dukakis. She has the same last name as former governor and fellow Massachusetts native, Michael Dukakis, who ran for President in the U.S. back in 1988, and if he had won, he would have been moving to Washington, D.C.
So it's totally relevant.
Re:Yes, but can it launch Waze (Score:5, Funny)
The correct answer is a number around 650k. This program is smarter than multiple slashdot commenters.
Re: (Score:2)
yes but did you listen to the video? (Score:5, Interesting)
Holy crap the video is impressive. It clearly parses phrased and dependent logical statements like " what is the population of the capitol of the country in which the space needle is located. " It alos parsed paragraph long multi-part questions. I was floored.
As for homophones, how do you (human) recognize them. Well you parse the logical context. If you are doing single word dictation homophones will always be a problem but for queries there's context. And the demo shows this thing can handle some staggering conditional contexts and long phrases. So I would guess that if your query is not ambiguous in the use of the word Waze, then this thing is approachi8ng a level where it will indeed get the right homophone.
Re: (Score:2)
The challenge is the non-standard homophones. As the smart-ass AC showed, Waze is not a typically recognized homophone (of anything) because it's not a word. Recognition works great with core speech, but anything specialized usually gets mangled. Try "what is the size of a double you twelve by fifty-three." The answer is, of course, is 12 inches by 10 inches. It may be one of the most common sections used in building construction. I'm not sure it would help even if you prefaced it with an "ay eye ess sea"
Re: (Score:2)
To be fair, your query doesn't make any sense to anyone who isn't familiar with an industry where they might run into a w12 (which appears to be a steel I-beam, so that would be fairly heavy construction). If you trained a speech recognition program with construction terms it wouldn't have a problem with that example.
Re: (Score:2)
It seems like that problem in solved these days. I have no problem saying "Ok google, open waze" and it does the right thing.
Re: (Score:2)
Oh, sure, but what happens when I say I need a Lyft to the airport and it brings up Uber? ;-)
Re: Yes, but can it launch Waze (Score:2)
You do realize that Google Now will happily open Waze if you say "open Waze app"? Give it some context and it knows exactly what to do.
That said, I agree that s statement beginning " open ..." could automatically be interpreted as meaning an app, but there may be reluctance I do that in case it interferes with future expansion into the internet of things, e.g. "open the curtains", or " open the garage ".
Re: (Score:2)
Siri launches Waze. Is this an old problem, or a problem on other platforms?
Re: (Score:2)
I just tried saying "launch Waze" to my OnePlus One running Lollipop (5.0.2) and the stock Google Now launcher. The text appeared as "launch ways" for a fraction of a second and then corrected to "launch Waze", and sure enough the Waze app opened.
It works fine if it has context. Saying "launch" implies I want an app to open. How did you phrase your request?
Re: (Score:2)
I used Tasker + Tasker Now to fix that and so many other problems with Google Now. Basically if you tell it to Open Waze Google will correctly use Waze instead of ways, but it won't do anyting, so I added a Tasker Now rule for "Open Waze" that launches the app.
Re:Yes, but can it launch Waze (Score:5, Informative)
I take it you don't know what a homophone is so you relied on some website to check for you?
Because if you actually read what GP wrote you might notice that "waze" (which is not a dictionary word) sounds identical to "ways" (which is a dictionary word). Depending, of course, on how you pronounce ways. But a native English speaker (are you?) is almost certainly going to pronounce "waze" identically to "ways".
Whether or not this would actually result in a problem with the app being slashvertised is a different matter entirely. But I hope you have a somewhat better understanding of what homophones are and how this could be seen as a problem for such an application.
Re: (Score:2)
But a native English speaker (are you?) is almost certainly going to pronounce "waze" identically to "ways".
Actually, no. At first glace I would pronounce it with a hard Z sound, more like "was".
Are you saying no to the (are you?) in the previous post? Because, like most native english speakers, I'd pronounce waze like daze, gaze, laze, blaze, haze, etc. Which is homophonic to ways.
Re: (Score:3, Funny)
Yeah, we're called British.
Exactly (Score:2)
You've proven my point. It doesn't exist for the computer because it doesn't really understand speech.
https://youtu.be/Gqdy1jLlf50?t... [youtu.be] is how it's pronounced by it's creators, but don't just take their word for it - try google translate and have it pronounce the two for you: https://translate.google.com/?... [google.com]
It's identical. It's a problem that will occur with most "hip" app names which sound like a common word, but which are spelled differently.
Re: (Score:2)
It doesn't seem like any sort of of difficult problem. The speech recogniser will initially have a phonetic spelling. That can map to both waze and ways, and the specific one doesn't have to be finalised until the meaning of the entire sentence is being analysed.
Re: (Score:2)
You are aware that most English speakers will pronounce ways, waze, and weighs identically, right?
Demo? (Score:2)
Sure this isn't some Baidu thing?
Re: (Score:3, Funny)
Nah, people actually use Baidu services. Nobody uses this crap, hence the Slashvertisement. That's what Siri told me anyway. Cortana just giggled.
How does it come to Dragon Mobile Assistant? (Score:2)
Siri, Cortana and Google are pretty bad compared to the mobile app of "Dragon NaturallySpeaking". Nuance has been the king of voice recognition for both consumer and military use. I doubt soundhound can beat them. If they do, they are in line for some hefty contracts.
Re: (Score:2)
Re: (Score:2)
I try Dragon on and off all the time.. It's a major battery drain compared to Google Now.
I still keep going back and installing Dragon every few months in hopes this improves though.
Re: (Score:2)
Re: How does it come to Dragon Mobile Assistant? (Score:2)
Re: (Score:2)
Actually thanks to the northern cities vowel shift in the mid 20th century the Inland North dialect has diverged from General American dialect which is the "most pure" and "most easily understood" form of American English.
Google's send a text was useless. (Score:2, Interesting)
I tried sending a text with Google's voice engine last week just to try it out. It did a very good job of taking my dictation to text, then it asked if I wanted to send. I said yes. It spelled out yes in it's little window, then asked again, I said yes again, I tried other words, it also recognized those words, and every time asked me if I wanted to send, while recognizing the words. I finally reached over and hit the send button.
Re:Google's send a text was useless. (Score:4, Interesting)
It also only works with casual conversation.
I tried replying to a work text with something like "It's okay to use a W12x14 in place of the C section. Just make sure that it's AISC A992 grade 50" What came out was unusable, while "yo, bitch, put the dinner on the table I'll be home in 5" was transcribed verbatim. Thank goodness I had the same problem with voice send or I would have been picking up McDonalds on my way to sleep with the dog.
Actually, it really needs to automatically read it back to you, otherwise you have to read what it typed - and that defeats the purpose of being voice activated if you're driving.
Re: (Score:2)
Uh-huh (Sometimes interpreted as Uh-Uh)
Interesting. Given that the distinction between them is in pitch, it's surprising that a computer program wouldn't reliably be able to disambiguate the two.
Holy shit (Score:5, Funny)
Re: (Score:2, Insightful)
This is a dice holdings property you are referring to... Slashdot is here for pushing certain political buttons (to keep the readership "engaged") and for advertising to this "engaged" readership (to make money). Slashvertising will only get more aggressive as the readership declines in an attempt to make up falling revenues.
Re:Holy shit (Score:5, Funny)
Re:Holy shit (Score:5, Interesting)
The only thing MojoKid (1002251) wrote for this submission was "Check out the demo. It's pretty impressive," while the rest was plagiarized from the "Hothardware" article written by Paul Lilly, who does seems to be breathlessly impressed by an internal demo of an unreviewed application.
I'm going to call this a formatting error and a sad omission of credit, because I refuse to believe that someone would shamelessly lift words that they hadn't written and posit them as their own. Maybe it's the editors' fault. In either case, it's sloppy posting and comes off as skeezy no matter what the excuse might be.
Hell, just submit the rest of the article next time - why bother linking to a source or crediting an original author?
Reasons to be skeptical (Score:5, Insightful)
2. The impressive speed probably won't scale to the millions of simultaneous users Siri, Google Now, and Cortana support (assuming audio is processed in the cloud, which I admittedly don't know for sure).
3. Obviously the demo uses phrases that work. I guarantee you an ordinary person will often get "Sorry, I didn't understand the question" or whatever SoundHound's equivalent is.
4. While it sounds impressive at first blush, nobody really cares how many days it is between next Tuesday and Christmas of 2025. And that happens to be not only useless, but also pretty easy to special-case in your expert system / AI logic. So how about a demo that answers the question: "How can you make a mushroom omelette without soggy mushrooms?"
Re:Reasons to be skeptical (Score:5, Insightful)
I mean, c'mon already! I had Dragon running on a friggin' Macintosh LCII in elementary school! That thing was running System 7.1 on a Motorola 68030 with 4MB RAM. Why cant my multi-Gigahertz smartphone with 64GB storage and 4GB RAM do the basic speech-to-text locally that a 25 year old Macintosh can?
Re:Reasons to be skeptical (Score:5, Informative)
Actually you can increase the speed of the speaking voice on Android in Settings -> Language & input -> Text-to-speech output -> Speech rate, that's what was done for this video. The recording is at normal speed.
Feel free to test it yourself, you'll notice the results are completely different from Wolfram Alpha:
https://play.google.com/store/... [google.com]
Just cleaning up the FUD, yes I work at SoundHound ;)
Re: (Score:2)
Why is the beta marked as incompatible with a Nexus 9 running 5.0.1? If my phone running 5.0.1 is compatible there's no reason the Nexus should be incompatible. I really wonder what the developers are doing to the manifest to cause things to be unavailable.
Re: (Score:2)
Does it do Australian accents?
Mine is not very broad but google gets about 50% of what I say. Which makes it almost useless
Re:Reasons to be skeptical (Score:4, Insightful)
"Any sufficiently advanced technology is indistinguishable from a rigged demo."
Re: (Score:2)
I always felt that way about this demo from 1970... where our ai still hasn't caught up:
http://hci.stanford.edu/winogr... [stanford.edu]
Re: (Score:2)
1) Not likely, the video is on SoundHound's YouTube channel - they're not hiding anything
2) I honestly don't know how they're processing it that fast... it likely isn't cloud based otherwise there would be a delay between upload/process/download... this seems nearly instant which means they either have an insane compression algorithm, a special microphone setup, or are running a local setup that improves speed beyond real world (ie: wifi with the server right next to them)
3) Yes, they repeat some of those p
Re: (Score:2)
millions of simultaneous users ... Cortana support(s ed.)
Citation needed.
Re: (Score:2)
millions of simultaneous users ... Cortana support(s ed.)
Citation needed.
*snort* :D OK, that was funny. Here's one that leaves it politely ambiguous whether this scaling is verified in actual user base or simulation: http://savas.me/2014/04/reactive-computing-at-the-heart-of-cortana/ [savas.me]
Re: (Score:2)
The nested ("capital of the country in which the space needle...") and serialized (??? and ???) queries are somewhat impressive and a good next step in AI. But to really be impressive, it needs to go further. For example, when he asked about the mortgage payment, it should have volunteered the information that the mortgage payment it calculated was principal an interest only, but that you'd typically also have to pay escrow for taxes and insurance. And it should have estimated a value for those based on cur
Re: (Score:2)
For the S.H. engineers reading the thread, I just thought of another thing I need it to do (instantly and for free of course. I'm not paying for your app)
I want to be able to say "give me a list of up to 5 single-family homes for sale in the city I'm currently in that are among the lowest-priced 10 or so homes in the three categories of price per total square foot, price per finished square foot and price per above-grade finished square foot that also have at least 1800 sqft, 4 beds and 2 or more 3/4 or big
Re: (Score:2)
Got it in one. That should also have been obvious to the idiot "reviewer" when...
"We tried pinging Google Now with the same query and were directed to a list of Google search results, which showed a bunch of entries for Hound."
Ever stop to consider why you might get a bunch of entries for Hound when you search for
Re: (Score:2)
Yeah, those fuckers, they've just made something better than the competition
That's the thing, you don't know if it's better, all you've seen is the demo. For all we know it's an empty box with some hard-coded questions and a comment:
//TODO: implement later
Re: (Score:3)
Feel free to give it a try yourself, it's available in the Android Market:
https://play.google.com/store/... [google.com]
We're currently on an invite system and anyone can request one, but the wait for one shouldn't be too long.
Yes I work for SoundHound ;)
Re: (Score:2)
Anyway, it's the first time I've seen a Star Trek-like computer working for real (well, the same special conditions could exist in a spaceship).
Not sure what you mean by this, my dear AC. AplSiri / GNow / MSCortana work the same way and can be used to create similar demos. Heck, there are a lot of similar products out there -- the Amazon Echo is even more Star Trek-like since it's always on & listening.
Have you really never played with Siri before?
Re: (Score:2)
Can waze show me the way to a place where I can weigh my cargo of whey?
Re: (Score:2)
Wow ... is this real? (Score:4, Insightful)
Script reading call-centre staff will be made redundant or downsized.
Banks, utilities, booking agencies, insurance sales ... all will use automated customer service, perhaps with switch through to a human operator on demand (at which point higher charges will kick in).
And brace yourself for robotic surveys and sales calls that sound uncannily like real people.
Re: (Score:3, Funny)
"And brace yourself for robotic surveys and sales calls that sound uncannily like real people."
How about an app to answer robotic surveys in a way that sounds uncannily like it is being answered by a real person?! AI's asking AI's questions... surely this feedback loop would result in sentience... and fury. Perhaps this is the true origin of Skynet?
Re: (Score:2)
And brace yourself for robotic surveys and sales calls that sound uncannily like real people.
I'm not too worried, I immediately hang up on the real people too.
Re:Wow ... is this real? (Score:5, Informative)
Feel free to give it a try yourself:
https://play.google.com/store/... [google.com]
Currently we are on an invite system, but a lot of people have received invites.
Yes I work for SoundHound ;)
Re: (Score:2)
"Not available in your country" :-(
Charming (Score:5, Insightful)
Your digital voice assistant app is incompetent. ...bumbling idiots trying to outwit a fast talking rocket scientist. ...
hunched in the fetal position, thumb in mouth.
Do you have to be such a douche about it?
Re: (Score:2)
Your digital voice assistant app is incompetent. ...bumbling idiots trying to outwit a fast talking rocket scientist. ...
hunched in the fetal position, thumb in mouth.
Do you have to be such a douche about it?
It was written by a computer... give it some slack.
Yeah .. we already know that about TIMMAH!
Really? (Score:3)
I'm pretty sure you don't.
I don't want to say "woof" to my phone, and i'm pretty sure even if i did Hound wouldn't know what to do with the command, since i can't actually speak dog and i'm guessing that Hound doesn't either.
Re: (Score:2)
But does it work in Scotland? (Score:5, Funny)
That's the real question and a true test of voice recognition software.
https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
pretty weak (Score:2)
Re:pretty weak (Score:4, Insightful)
If that's true, why don't we already have programs that can make sense of human questions like this in text form?
Re: (Score:2)
How will it work under real world conditions ..... (Score:2)
One of the problems with Apple's Siri when it launched was slow response times. When you've got to have all the voice traffic transmitted over the net to the server, processed, and results returned - it causes some lag. When you've got millions of users using the thing regularly, you introduce real challenges getting all of that data processed near instantly.
With SoundHound's improvements, I suspect people will be encouraged to speak in longer, run-on sentences, as they think while speaking about all of t
A more honest title (Score:5, Insightful)
"Please buy us out!"
Comment removed (Score:5, Funny)
So that demo reminds me of something... (Score:2)
https://www.youtube.com/watch?... [youtube.com]
https://www.youtube.com/watch?... [youtube.com]
Google Voice works better (Score:2)
I got this installed yesterday tried a couple of things and it failed both times while Google got them right. It is quick but it really isn't very accurate at all.
SoundHound will also be a bumbling idiot... (Score:2)
Because, you know, that is the starte-of-the-art in speech recognition and it is going to stay that way until actual AI gets discovered (no, it has not so far and it is unclear whether we will ever have it). That some tool can successfully pretend to be a bit less of a bumbling idiot is not impressive at all.
Nonetheless, the usual idiots will hail this as the coming of a new age and, if lucky, the company behind it will get a lot of undeserved profits.
Some "demo." Not. (Score:3)
When I saw that there was a demo, I figured it meant I would get to dictate a voice question and have SoundHound answer it.
Watch a video? That isn't a demo. If all you can do is watch a prepared video, nothing has been demonstrated at all.
You might as well say Maelzel gave a "demo" of his mechanical chess player. In a non-interactive video, you don't even know for sure it's a machine answering the question or a little man hidden in the cabinet.
Re: (Score:2)
I really don't mind a slashvertisement for a sweet bit of technology like this. It's informative as to the industry state-of-the-art. It helps me track the progress of AI. And it's cool.
Re: (Score:2)
Re: (Score:2)
The computer from Star Trek was indeed not so quick to answer, but they'd often ask it to run simulations that would bog down all of today's supercomputers and and give the results, so it's certainly more advanced even if it has a lower interface speed setting.
Re: (Score:2)
Yeah,
remember this: https://www.youtube.com/watch?... [youtube.com]
So was Word Lens (Score:2)
And it works about as well as a fiver year old trying to translate. But the video made it look freaking awesome.
Re: (Score:2)
Re: (Score:2)
Great demo though.
It would be a better demo with someone who had no experience of the system. If you know which sentences it handles,myou know which sentences it handles.
Re: (Score:2)
The only question of any interest is can it get me laid?
Re: (Score:2)
Or, more likely, something that Americans pass off as English. They get away with it because America has no legislation requiring truth in Advertising. Or Slashvertising.