Whatever Happened to Voice Recognition?

Very interesting topic. Maybe before we can venture out onto 100% voice recognition we need to have prior advancements in the field of Artificial Intelligence. I mean unless a computer ‘understands’ us in the same way as we understand another human being, it cannot really tell the difference between the phonetic representations of ‘their’ ‘there’ and ‘they’ ‘day’. Guess it needs ‘common sense’ to guide it to choose what to interpret.

Dunno how we’re gonna infuse common sense into our computers. What if it meant that we had to take whatever technology we have, what we are striving on, and tear it down to the basics and come up with something else completely new? Maybe if someone actually does this and shows us that yes voice recog tech is more feasible than the cost of global computer eradication… even if temporarily :)…

Well for the time being it would be cool if we just had everyone use their fingerprints or retinas to identify themselves over the internet… hmmm or maybe voice prints (dunno if each voice is unique) :slight_smile:

I’d like to see a lot more points on that graph before I drew those nicely decreasing lines on it. I really wouldn’t be surprised if voice recognition algorithms are prone to a kind of punctuated equilibrium, i.e. new techniques are adopted giving large improvements and then have periods of stagnation (or only slight improvements).

@Hymnos, never ever trust the Gates. He has been wrong much more than right. He blurbs stuff and hopes they come true one day. I think MS is only as big as it is because it had Gates as skipper on the business side. He was smart enough to realize he is not as smart as people make him out to be, so he bought lots and lots of smart people to work for him instead.

So in that sense he is brilliant, for the rest, not so much.

“Computers Make Strides in Recognizing Speech.” This is an article from the NYTimes. It mentioned advances in speech recognition.

http://www.nytimes.com/2010/06/25/science/25voice.html?hp

There are to many homophones

Exactly.

Voice mail to text would be nice. I can’t understand someone leaving their phone number at 700,000 cpm anyway. Maybe the sender could be forced to edit the transcript before sending?

Voice will never work until we are required to prepend “Computer” to every command.

Derr.

For some reason, the activation of text-to-speech on the android is pretty rough in the default OS. Other than that, though, it is pretty lame sometimes. If your vocabulary is a little different it does NOT handle it well. I’ve definitely thought about getting a better speech-to-text program, but I’m too cheap.

Ubiquitous voice programming, it seems, will have to wait for a universal ID system that includes your own personal voiceprint. What’s interesting is that if you had a very recognizable word (‘computer,’ for example) that you reserved for starting authentication, and a unique marker (EG, a small radio transmitter) you could probably build a centralized system that did voice recognition much more effectively.

SOoooooooo… You want Star Trek, wait ten years and hope cloud computing catches up to what you want. Someone’s working on it.

One thing no one seems to be considering is how much the OS is important when it comes to the input methods (voice, touchscreen, mouse/keyboard,etc…)

Touchscreen was not invented for the iphone, yet the Iphone OS from Apple was the KEY in making good use of the technology, why could you not apply the same thought to voice recognition?

Just try to imagine a new OS which would be centered around voice recognition, and thus new applications also made to be used with voice.
Windows (and every app made for it) are designed with the very idea that mouse and keyboard will be used as inputs, the iphone with your fingers and they work fine that way. Why would the voice work efficiently to control these OSes?

Would you need a taskbar, icons or the minimize/close buttons on a real voice-controlled OS? Of course not, it’d be ridiculous.

It’s hard of course to reimagine things in a new way, but hey, apple did it.

Seconding the NY Times article:
http://www.nytimes.com/2010/06/25/science/25voice.html
It disagrees with what you’re saying here.

You’re correct, Jeff, it doesn’t work because the bandwidth of spoken communication is severely limited.

Ever heard how a pictures says 1,000 words? I would say a mouse click says a few hundred.

<This is replying to the blog post only, not to comment posts.>
Of course, because human language is always changing in tone and usage, speech shouldn’t be something to use when communicating with a computer. Computers are in theory “perfect”, and human speech is far from the same standard. So, ditching speech recognition entirely and moving on to some other data input method is most likely the easiest and cheapest path to take. Although, speech recognition can still be useful in some things, like for people who have a hard time typing (handicapped). But, if we explore the other methods for input, maybe someday we will find something far better than speech recognition.
I hope this made sense. I just type the words as I think them up. :slight_smile:

When you constrain the voice input to a more limited vocabulary – say, just numbers, or only the names that happen to be in your telephone’s address book – it’s not unreasonable to expect a high level of accuracy.

Expect, yes, achieve, no. On an iPhone with over 400 contacts I’ve found the Voice Control dialing to have less than 5% success. It improves, slightly, if you append “home”, “mobile” or “work” to the end of “call so-and-so”, presumably by limiting the search to the set of contacts that have the available services, but still usually calls the wrong person.

And you should forget about using Voice Control to play tracks or albums in a music collection of over 2,000 tracks.

Regarding Minority-Report-style hand-waving interfaces, there’s always the obligatory OK/Cancel comic: http://okcancel.com/comic/3.html

An interface that forces you to wave your hand around in mid-air is very tiring. It may work for a minute but not for a work day.

As for hand-writing recognition: It definitely has uses. I don’t use it to wite larger corpi of texts but I do own a Tablet and frequently use OneNote to do sketches or diagrams. I still do most of the text with typing there but the very fact that even my handwriting is searchable (and the recognition is pretty good) is a big plus for anything I just quickly scribble down. And the Math Input Panel in Win 7 does its job also quite well (once you’re accustomed to write numbers like it expects them to be. For me writing a 1 as a simple line is still awkward) and for some formulas it’s faster than typing.

Main problem with Tablet PCs though: quickly changing from keyboard to pen input involves turning the display. Direct manipulation on the screen with a pen is nice but you notice accuracy errors, you have parallax and generally having the screen flat on the table doesn’t let you see as good as having it angled. The angled one is unsuitable for pen input, then, though.

Handwriting recognition does exist on the iOS devices in the form of Chinese input and it’s actually quite impressive.

I wonder how much has really been invested in voice recognition.

Voice recognition tech would come in handy when interacting with automated homes, appliances, cell phones, etc. So to close the master bed room shades say “Upstairs Master Shades Close”. The software will be looking for a small group of voice commands for the first word like “All, Upstairs, Downstairs and Basement”. This first command tells the computer which floor to activate. On each floor there is a series of other voice commands which would be the room list like “Bath, Childrens, Guest, Master, Closet and Hallway”.

So with each command you get more and more specific. Far from conversational speech but it is logical and direct. No bullshit = No error. Probably would not have to voice train with a minimal voice interface like this.

Now in terms of voice transcription like the Google Youtube video thingy, I assume you’d need faster processors and much more detailed thorough programs to determine the difference between the vocal versions of ‘site’ and ‘sight’. That’s context recognition. It’d have to be able to understand context by ‘keyword’ collection and relationships. Semantic web comes into play here and that is not yet here. Also advanced machine learning, AI has to be present. Vast databases will be called upon. So much faster processors will be needed.

But why screw with voice recognition when thought recognition is the real aim. Its much more elegant and minimal than voice. Less waste and confusion. Along with thought comes vision. So future computer interfaces will be purely visual & mental. What needs to happen is a full merger between the fields of synthetic biology, nano-technology, AI, machine learning, computer hardware, etc. We can really start to see that the lines in some cases are beginning to blur. http://www.youtube.com/watch?v=IyAOepIU6uo

Word.

There is a pretty successful science fiction writer named David Weber who does not type his books, but dictates them with voice recognition software. He had an accident of some kind where he can’t type. He discussed this in an interview on a science fiction site called thedragonpage.

go to cover to cover -> show archives (drop down)

his interview is there. The next show actually discussed writing books by using voice recognition software instead of typing.

Hold your horses. It hasn’t arrived yet. It’s called a voice recognition card similar to a video card. Be patient.

If you think back to the sophistication required to complete many of the common computer requests in Star Trek, I think you’ll realize that accurate recognition of human speech was pretty trivial in comparison.

Usually there would be a request like, “Computer, analyze this phenomenon and then search your data banks for any previous occurrences. What is the most probable cause.”

Star Trek computers were apparently all sentient beings. They were just extremely modest sentient beings and didn’t want to make a big deal about it.

Ah, “data banks”. As a phrase it seemed to make so much more sense back then.

-irrational john

Hopefully someone will read this.

I live in a slight version of terror that speech recognition will become commonplace. I stutter. Not terribly, but it’s there. It is even more pronounced when I have to pause, and clearly say something without any type of preamble. It’s even worse when it is my last name.

I program for a living, and love it. However, if think people are listening to me, then my stutter becomes more pronounced. I can sing fluently, but I think singing might make me feel stupid, screw up the voice recognition software, etc.

As we all (should) be taking handicaps into account as we program (blindness, deafness, etc), please, let’s not forget the people in the world with a very un-publicized impediment to life, the stutterers.