V-ENABLE, Speak Up
Contact V-ENABLE

Talk of the Town

The most powerful software of the future may be your very own voice.

By ROBERT McGARVEY - Posted 11/01/2006
American Way

Your wallet is empty and you're hungry, so you do the logical thing: You ask your car, "Where's the nearest ATM, and, by the way, where's the nearest sushi restaurant?" No, you're not crazy. You're just driving a Honda Accord equipped with Touch by Voice, a voice-recognition system powered by IBM. Seconds later, the car talks back through its speaker system, telling you where to load up on cash and also where to score a California roll. It all happens so effortlessly that you forget you're talking to a computer.

Slowly, discreetly but pervasively, voice recognition - where computers hear us speak and know what we mean - has become a part of our everyday lives. "It's amazing how often most of us now use voice recognition, frequently without realizing we are," says Peter Mahoney, vice president of marketing for Nuance Communications, a Burlington, MassachusettsÐbased developer of tools for what the trade calls "voice rec."

About a decade ago, when big companies first began experimenting with voice recognition, we definitely knew we were tangling with it, because most of the time the systems did not work. Computers would whine, "Could you say that again?" "Sorry. I don't understand." Our reaction was a swift no thanks - give us a person to speak with. Now computers do understand us. "Accuracy is much better today," says Mahoney, thanks to computers that are smarter and more powerful. Underlying recognition algorithms (the math that shapes the systems) have gotten better, too, as having 10 years of input has permitted researchers to tweak their formulas to let us speak more naturally but still be understood.

Also fueling this recent consumer embrace of voice recognition is what Mahoney delicately refers to as a "backlash against talking with nonnative speakers," which, put more plainly, means that many of us would rather talk to a computer than to someone at a call center in a developing country. "We are a self-serve society," adds Cambridge, MassachusettsÐbased Paul Kowal, coauthor of Enabling IVR Self-Service with Speech Recognition. "Voice-recognition systems are always friendly." They never tell us we are wrong, they are unfailingly cheerful, and, increasingly, they're indeed giving us what we want. What's not to like?

Voice recognition is also cheap - low costs are driving many deployments as companies look for ways to save money on human employees, who require salaries. But the real excitement swirling around this software is the growing recognition that voice is one data-input device most of us always have with us, particularly in an age of ubiquitous wireless phones. Companies are now learning to harness voice inputs so that we can do truly cool things more quickly and easily than ever.

"Using your voice to get the information you want is 10 times faster than using a mobile phone's keypad," says Dipanshu Sharma, CTO of San DiegoÐbased V-Enable, a pioneer in developing speech-based search tools. Of course, a conventional wireless phone can be used to search for, say, movie showtimes - but go ahead and type in "Snakes on a Plane" and your zip code. Wouldn't it be much faster just to ask your phone for this information? "With our service, that's what you do," says Sharma, who also says Verizon Wireless subscribers already can tap into this voice-powered service.

"Voice is quicker and smarter than your finger is," agrees Tom Freeman, a cofounder of VoiceBox Technologies in Bellevue, Washington. "Using keystrokes, you'll probably need at least eight to download a new ringtone for your phone. Using our tools, just say, 'Show me ringtones by Usher.'" Freeman adds that, lately, voice-rec tool providers have been upping the ante. "We don't want the system ever to say, ÔSorry. I don't understand.' We are trying to build in context awareness. The better we understand your context, the more likely we are to understand what you want." He provides this example: Say you call a help number and mumble into your cell phone, "Blah-blah traffic." Are you asking about a 1970s super band? Michael Douglas movies? Local road conditions? If you were to call into a movie hotline, the system would have a head start in giving you the right information. "We are getting smarter about building a hypothesis about what the user wants to know. That's as important in improving responses as are the gains in understanding the spoken words," says Freeman.

Here's how smart voice rec has become. You are in a hopping, noisy bar in Bangkok, and suddenly you're overcome with the urge to sing "Jumpin' Jack Flash." Dial into Grammy Thailand, a Thai wireless and entertainment provider, say the song title, and bam! - out blasts a karaoke-perfect version of the Rolling Stones classic. "Put your phone in speaker mode and sing right along," says Mike Katz, director of product marketing at NMS Communications, a Framingham, Massachusetts, developer of communications technologies.

Hold on, though, because you really haven't heard anything yet.

IBM's voice-recognition guru, Brian Garr, says scientific ambitions are white-water fast when it comes to what voice recognition will do next. IBM's Superhuman Speech project, for instance, aims to create computers that are better at understanding speech than humans are, says Garr. That's right, better than we are. "We believe we will be there by 2011," he says.

"We're still in the early days of speech recognition, comparable to where the Internet was 10 years ago," adds Garr. But, watch out, he says, because just as the Internet became integral to our lives, so will speech rec, probably a lot faster than we expect. "We're just now figuring out so many new uses."

The examples keep multiplying. A case in point, coming probably within the next year to your cell phone: "You'll be able to dictate SMS messages into your wireless," predicts Nuance's Mahoney, whose company is far along in its development of that very tool.

Picture zapping this message to a coworker: "SMS iz kewl 2 uz, bt a pain 2 typ, w aL d multi-tapping. It wud b so gr8 jst 2 spk it!" How long would it have taken you to tap that into a phone? And that's assuming you know the SMS shorthand that allows quicker input. But it would be many times faster just to dictate your message and let the smart phone do the typing for you.

"Big leaps are coming in the near future," promises Mahoney. The technology, finally, is here - computers hear and understand us. Now it comes down to creating tools we want to use - and that, says Mahoney, is exactly what's going on. Can you hear it happening?

Source: American Way