Whatever Happened to Voice Recognition?

I’m not sure if you guys/gals have seen this, but here’s a comical drama of Voice Recognition Technology vs. Scottish Accent.

http://singularityblog.singularitysymposium.com/the-perils-of-voice-recognition-technology/

If we can’t nail voice recognition, then the robocalypse (that we all secretly want because robots, even blood thirsty murderous robots, are cool) will never happen.

Sadly even text to speech isn’t very good. You would think with all the technology we’d be able to at least get that right.

I think you’re missing the point.

The next step after reliable voice recognition is proper language comprehension. I mean like an incredible secretary inside your computer.

For example, you could bark instructions such as the following at your computer: "I was working on a document a few months ago and can’t seem to find it. Ehh, I think I had written about a 5 or so pages - it was an initial stab at the implementing a proprietary database replication - can’t remember if I archived it in my emails or saved it as a separate doc…"
And the computer would pop up the best candidates. "Is it one of these…"
Or even: “Can you take Jim’s presentation from last months management meeting - it should be on the shared drive (if not send him an email asking for the location) and use that format with the points I recorded yesterday. Try to make the formatting look really good. Oh yeah - can you order me a BLT - but no mayo this time. I’d like it delivered for 1:30. With a can of coke.”

Try doing that faster with a keyboard and mouse.

@Atwood: “While it still surely has its niche uses, I personally don’t miss handwriting recognition. Not even a little. And I can’t help wondering if voice recognition will go the same way.”

@Jared Taylor: if you can type faster than handwrite (which most people can) obviously handwriting recognition is pointless.

Well I find I’m using HWR all the time. There is a danger about the keyboard, it can be too fast. Writing quickly without first forming one’s thought and argument carefully leads to a less effective and persuasive communication. For this reason my Apple Newton continues to get daily use. I’ve even got one doing duty as a web server: http://misato.chuma.org:8080/

For me, there are many situations where pulling out a keyboard device isn’t culturally acceptable. I have lots of business meetings in a day. Staring into a screen and tapping into a keyboard is seen as distracting and if focused on it too long, people start thinking you’re playing.

@Rob O’Daniel: “On the handwriting tangent, ya hafta wonder why Jobs & Co. didn’t build handwriting recognition into the iPad. Is it that he simply didn’t see the need to compete with tablet PCs, he didn’t believe it’d be successful enough, or that he views handwriting as a useless and all but dead technology?”

I think Steve didn’t want

  • the bulk of a decent sized stylus to ruin the aesthetic value of the iPhone design.
  • the complaints he might get from people with poor quality handwriting
  • to relearn the lessons from the Newton. Apple learnt from the Newton
    experience that it was much speedier navigating around and getting things done just by tapping rather than writing. The Newton UI guide talks about this and recommends to designers that they minimize any writing required: See http://www.4shared.com/document/a5KyvcqG/ui_guide.html and http://www.4shared.com/document/w7ztLVE0/uiguidl.html

Completely agree with Matt Dawdy!

Though, I think the unhandicapped world is too addicted to the mouse and key board/pad than to speak or 3d’ize the work environment.

Hmm, not sure where those numbers came from. I have an acquaintance who is a speech recognition researcher & (as far as I can remember) he told me that the current accuracy of recognition of clean speech (speech without any background noise) is about 95 to 98%. The caveat is that the recognizer has to be trained for the person’s voice and a particular acoustic model. According to him, the big problem for speech recognition isn’t recognizing clean speech, it is recognizing speech with a background noise.

Another acquaintance at a big consumer electronics company spent several years working on voice control for household items like air conditioners, but his company gave up on the research because it seems that the level of accuracy they achieved, about 80%, was unacceptable to consumers.

Sorry for the late post.

In regards to speech impediments, regional accents and general usefulness for the handicapped, you guys should check out VoiceAttack (http://www.voiceattack.com). It works with Windows’ speech recognition engine (which means it needs to be trained to your voice - which makes it good for those that have certain speech impediments). What it does, basically, is execute macros based on specific phrases that you input. It is set up mostly for gamers (keyboard input), but, you can launch/kill programs and play .WAVs and text to speech, etc. Thought it might be worth mentioning.

  • Gary

You put the maximum accuracy rate of software transcription at < 80%. I assume that is regarding recognition when no context is employed.

Where the context is known, recognition can be much more accurate.

Analyzing the speech that can be recognized and then using that to help provide context to unrecognized speech is where the gains in accuracy are going to come from…

I’m a little bit hard of hearing and I often have a hard time understanding what people are saying. I use what I hear to give context to the words that I didn’t catch or misheard.

It will happen. Half the text I enter on my Android phone, I enter using the speech recognition. It does a fairly accurate job with a little light editing.

I’m reminded of this post from early Java One days, when attendees suggested using voice recognition to speed up the talk transcription process: http://java.sun.com/developer/technicalArticles/InnerWorkings/TranscriptHumor/ - especially the part about the questionably feasible “virgin control”.

First thing, the idea of voice recognition in a cubicle farm isn’t nice. I’m stealing an example from someone else in the days of DOS, but imagine a guy talking to the technical guru who happens to be a bit loud… “How do I wipe my hard disk?”, “format c:”, “are you sure”, “yes” (everyone in the room starts screaming).

As for why voice recognition isn’t progressing any more, I think the problem is older than it looks. No-one has been even trying to improve voice recognition for years. What they’ve been working on is statistics. This is most likely the be followed by that, yada yada. This can help improve the results from actual voice recognition to a point, but some time you have to improve the underlying voice recognition too. The stats aren’t magic - they need something to work with.

What a relief to know I’m not alone in realising voice recognition is a LO-O-O-O-ONG way from being the solution to mankind’s ills. I’ve been hampered by a work-related arm injury for 18 months and cannot tell you how many people (employer, insurer, etc) think that just because they’ve provided me with voice recognition software, I should be just as productive as before. AAARGH!!!

Hilarious side note (not). When I asked Macspeech Dictate to cache this page so I could comment, it crashed. Hollow laugh…

will voice recogintion work?

voice recogintion is good for general direction.

just lmao when i saw this video attempt to handle simple programming.

http://www.youtube.com/watch?v=boYjnZVlO5I

oww that was soo funny.

I started using voice-to-text recently. As a person who works at home, online, writing all day most every day, I recently developed repetitive-motion issues that start at my neck and include my entire right arm. Getting my computer to understand me seems to be the only solution.

I actually found this article while I was searching for the commands to type in a window other than notepad. I use the Vista VTTR software, and I really like it. I’d just like to be able to use it a lot more places than I currently do.

Does anybody know how I can get it to type ‘1’ instead of ‘one’? It seems fine once it hits 11. Then it automatically types numbers instead of alpha.

The problem is in English itself, like in the linked article, the “recognize speech” / “wreck a nice beach” examle. Without paying close attention to both the context of the conversation and the words itself it’s only hardly distinguishable. The rules get even more relaxed around direct speech, popular phrases like “Beam me up” and jokes.

The ambiguity that particular words / phrases can be written down in several different ways is the problem. In my native language (http://en.wikipedia.org/wiki/Czech_language), I’m not aware of any word/phrase that you could write down in different ways without changing its meaning (or just trying to look stupid).

I guess there will be some cutting-edge AI programming and massive data processing behind first real speech recognizer (the one which works with all/most languages on earth), but who knows :slight_smile:

Even humans such at speech recognition.
One day, I hear “… the coffee at Tim Horton’s …” as “… the coffee at 天河城(tian1he2cheng2) …” because I went to that mall.
Recently, I hear “quantum quantum” in the song Hauu Nanodesu (http://www.youtube.com/watch?v=AjNPeKgLnC8 ), after I looked it up, it’s “patapata”. I think I read Wikipedia too much.
Having an excessive knowledge base will impede voice recognition.

(insert rants about natural languages here)

I feel like the key here is “voice training”.

I had recently looked back into some free voice-recognition software, and the big thing nowadays seems to be trying to create some sort of ultimate system that requires no training by the user - some sort of phoneme/morpheme corpus that could apply to anyone.

This, I believe, is impossible. Not even humans can do it sufficiently when regional accents are involved, let alone accents caused by your first language when trying to get the computer to understand your second or third language…

The link to Robert Fortner’s article needs to be updated to:

1 Like

It’s fun to read this post a decade later.
Now people have such devices in their home, look quite silly when they use them, and don’t care or don’t understand that their information is collected on a server. That is everything they say, 24/7, unless they specifically turn the device off.
Also your car might do the same. Also the private companies who you basically allowed to do everything do not always protect the data very well, like videos of children playing around a Tesla car getting leaked.

Sorry Jeff, but the ability of the machine to work correctly was never the real issue.
It’s interesting to see how people did not predict the true issues. Given Jeff’s capability of envisioning things, this is not about the lack of ability people had back then, but about the fact that nobody could picture companies having the audacity, and governments not doing their job. That is, we knew that companies and governments are like this, we just didn’t realize on which scale.

2 Likes

That and ‘bad actors.’ One would think they would have learned from experience going back into the DOS days when viruses started making their rounds as they exploited software that was written with “good intentions” without realizing it would be exploited by nefarious people/companies/governments.

1 Like