“[I]s voice control, even hypothetically perfect voice control, more effective than the lower tech alternatives?”
Depends on the situation, I guess. You astutely pointed out that people still regularly leave voicemail, but it takes SO LONG to listen to the actual voicemail. (well, compared to reading the same message as Text.)
I presume, by lower tech, you mean pushing a button or key on your keyboard. I think voice recognition can be useful. The ideal situation is those automated telephone menu systems. But I can also imagine it being incredibly useful in hands-free operation of a phone.
Enunciation and Projection
Most people speak sloppily. They don’t enunciate or pronounce their words properly. Nor do they know how to project their voice so that it is consistently clear. Our evolved brains use a lot of pattern analysis, interpolation, and contextual clues to fill in the gaps. While some speech recognition programs have some of this, they certainly don’t have the comprehensive suite of these techniques that our brains have both learned and evolved over the years. It will take awhile to actually figure out how they work in an algorithmic sense.
Back to my assertion about people speaking poorly. I once installed Dragon Naturally Speaking for a customer of mine who had a very outgoing demeanor, so in telephone conversation and in board-room meetings, people always could hear him and understand what he was saying. Yet, when he tried this speech recognition program, it clocked in at around 90% recognition because he wasn’t used to talking to a computer. Oddly enough, when a phone call came in while I was there setting it up for him, he (mentally) switched over to his proper mode of speaking (still wearing his head-set that came with the program) and recognition shot up to 99+% because he was speaking clearly and enunciating his words properly.
Once he got off the phone, it dropped back to 90%. It was quite funny, but clearly illustrates that most people don’t typically understand how to talk. (well, at least, how to interact with an emotionless computer.)
And well, I guess that’s the real part of voice communication that we’re not talking about here:
HOW you say something is just as important as WHAT you say.
Voice transmissions have a sub-channel that carry emotional information. Computers, currently, are completely unable to even detect and process this information. I’ll concede that it is generally irrelevant in speech-to-text processing, but it’s still part of the contextual clues that we, as humans, use to fill in the gaps of what is being said.