The following FAQs contain answers to questions about building applications using voice search. Technical FAQs and other information are available as an official V-ENABLE developer.
1. Which platforms does voice search work on?
2. What are the capabilities of mobile voice search?
3. What is the accuracy?
4. What is the latency?
5. How does it work?
6. Are there any commercial applications?
7. How easy is it to convert current applications?
8. What protocols, languages, ports does it use? Is it standards-based?
1. Which platforms does voice search
work on?
We have found through experience that the most effective platforms for client-server
speech recognition are BREW, JAVA, and SYMBIAN. Applications using WAP or XTML
browsers can also integrate our voice search technology. Our veANYWAY™ platform
is designed to work with all of these clients.
2. What are the capabilities of voice
search?
Voice search is very effective working with basic content/keyword databases
(artist names, city/state, business directories). With the limitations of the
cell phone memory, bandwidth, and today's Automated Speech Recognition technology,
full speech to text is not ready for commercial availability in applications.
3. What is the accuracy?
Metrics from current commercial applications indicate a very high level of
recognition. For example, our ringtone applications have shown up to 90%
accuracy when speaking the Artist Name. Our directory application has also
shown up to 90% accuracy when speaking the City/State.
4. What is the latency?
Metrics from commercial applications are now showing a neglible 1-3 second
latency from the time the speech input (Ex. "San Francisco, CA")
is recorded and sent to the time the information (San Francisco, CA) is
received. The majority of that time is network latency. As network bandwidth
increases and as handsets continue to improve, this should also be reduced.
5. How does it work?
Within current applications, the mobile user can have the option of inputting
data into a field with either speech or text. By holding the TALK key, speaking,
and releasing the TALK key, the mobile user will speak the information they
would like to input/search for (Artist Name, Movie Title, City/State). This
will be recorded, optimized for the mobile environment (using compression,
noise filtering, etc.) and processed through an ASR (Automated Speech Recognition)
system via the veGATEWAY™. Once recognized by the ASR, the keyword will
be sent to the relevant content server and processed using our proprietary
search algorithm which will deliver the most relevant content along with related
content to the mobile user. If no content is available for that request, our
recommendation engine will offer similar content to the end user.
6. Are there any commercial applications?
V-ENABLE has deployed 9 different applications on 7 operators. Applications
have been deployed on carriers such as Verizon Wireless, US Cellular, ALLTEL,
Cricket, Verizon Dominican Republic, and Verizon Puerto Rico.
7. How easy is it to convert current applications?
V-ENABLE has built SDKs for BREW, Java, and Symbian developers to easily integrate
code within their applications to add speech search. Custom development work
may be required for WAP and xHTML integration depending on the application.
Average integration time to integrate the veCLIENT™ SDK has been 3-5
business days.
8. What protocols, languages, ports does it use? Is it standards-based?
The veGATEWAY™ supports all wireless standards for protocols, and speech
recognition including WAP, SMS, MMS, VoiceXML, X+V, MRCP. V-ENABLE can support
the following languages: U.S. and U.K. English, Italian, Spanish, Catalan,
Brazilian, Portuguese, French, German, Greek, Argentinean Spanish, Chilean
Spanish, Swedish, Dutch.