Tuesday, September 2, 2008

iPhone speech recognition - status

The iPhone mania is pushing many vendors to offer speech recognition on this platform. As there are many offerings, I enclose a summary to ensure a simple presentation of the current status. I am not certain it covers all iPhone speech offering and will be glad to receive comments about additional speech recognition solutions for the iPhone.

There are many posts on the net about speech recognition around iPhone, some for command and control and some for perfroming free speech web search. Many people are disappointed that there is no speech recognition for iPhone while other are discussing working or semi-working applications and few names are mentioned:
  1. AT&T
  2. Nuance
  3. VoiceSignal
  4. VoiceDialer
  5. VOICE DIAL
  6. VoiceThis
  7. Fonix


AT&T exposed recently a research project - Watson Speech Mashups Architecture - a new software framework that casts AT&T’s WATSON speech recognition as a web service to economically bring speech processing technologies to the larger web and mobile developer community. This new capability provides network-hosted speech technologies for multimedia devices with broadband access (iPhone, BlackBerry®, IPTV set-top box, SmartPhones, etc.) without the need to install, configure, and manage speech recognition software and equipment. This enables easy and rapid development of new speech and multimodal mobile services as well as new web-based services. The software implementation is based on well-established web programming models, such as SOA, REST, AJAX, JavaScript and JSON.

AT&T provides a video of the YellowPages.com application with speech recognition on an iPhone.



Nuance & VoiceSignal are promoting separate solutions or maybe the same one? VoiceSignal is part of Nuance but still exposed its own iPhone speech recognition solution. I am confused and I bet many others inside Nuance are confused.

I wrote in the past about the Nuance iPhone offering. What I managed to understand from the various announcements is that Nuance named its solution OVS for open voice search (in some cases it is referred to as open vsearch). It is also known as the Mobile Solution Suite. Surprisingly on the VoiceSignal website, there is a demo video of vSearch which is a speech recognition based search application for iPhone. The application logo displayed on the iPhone in this video is the VoiceSignal logo (unlike the logo on the Nuance vsearch demo which is of course the Nuance logo).
If you are still not confused, take a look at the Nuance announcement from June 10th and compare to the VoiceSignal announcement from August 24th:


Nuance Unveils Voice Search Prototype On iPhone

APPLE WWDC, SAN FRANCISCO. June 10, 2008 — Nuance Communications, Inc. (NASDAQ: NUAN), a leading provider of speech solutions, today unveiled a ground-breaking prototype for voice search capabilities shown on the Apple iPhone (NASDAQ: AAPL).


The newly designed application introduces a new, more compelling consumer and search experience. Through Nuance speech recognition servers, mobile consumers — with no training required — can simply speak requests into their phone like “Find the Apple store in Boston, Massachusetts,” “Score of the Boston Celtics game,” or “Play Hannah Montana Best of Both Worlds” to quickly and accurately search the mobile web or in the future dictate an IM, SMS or e-mail message. The prototype, code-named “OVS” for open voice search, will allow mobile operators to offer simple ‘say anything’ search capabilities and is search engine agnostic, able to link to any search engine of an operator’s choosing. A video demonstration of the new application can be found at www.nuance.com/mobilesuite.



VoiceSignal Voice Enables iPhone in Proof of Concept Development

WOBURN, Mass., August 24, 2007– VoiceSignal Technologies, Inc., a leading supplier of speech recognition solutions, today announced that VoiceSignal engineers have ported several of VoiceSignal’s applications to the iPhone. These initial proof-of-concept applications include VSearch (mobile local search by voice) and VTunes (voice enabled music player).

The video demonstrations can be found either on the VoiceSignal website (www.voicesignal.com) or on the following YouTube links:

VTunes: http://www.youtube.com/watch?v=zne4rwCCmAc
VSearch: http://youtube.com/watch?v=ayrCCw5xWug


I am surprised by the Nuance marketing performance on this issue. I hope that someone from Nuance will wake up to clear this issue and maybe add some comments to enable us understand their offering.




SpeechCloud's VoiceDialer (free) was the first iPhone application to try to offer speech dialing on the iPhone. VoiceDialer takes advantage of the iPhone's always-on internet connection to record your voice and send it to SpeechCloud's servers to perform the actual recognition. Similar to the AT&T and Nuance approach. Once recognized, the application pulls up the contact's name and allows you to select which number to dial. Some of the criticism of the application is that it requires too much manual interaction (tapping on buttons) to actually dial a number, and slow response time due to the transferring of data across wireless networks.

VoiceDial ( by Makayama) ($ 14.99 on apple store) avoids actual speech recognition and instead perform audio comparison. VoiceDial requires you to actually record your own voice for each contact which can then later be used to match your voice command. If you are willing to pay the $15 and willing to record yourself saying your contacts, MercuryNews claims the product "works as advertised" and "had no problems recognizing the contact I wanted to call, even when it was similar to other names I'd recorded."

HRL Technologies' VoiceThis Dialer , ( $9.99) is an application that actually tries to perform speech recognition within the iPhone itself. No wireless connection required. Instead, the application runs within the iPhone. VoiceThis Dialer promises to offer completely hands free activity with the ability to dial contacts and even quit the application with your voice.

Fonix
Speech is currently developing iSpeak, which includes a run-time engine that sits on the phone allowing users to interact with the personal contents of their Apple iPhone™. Unlike other voice applets that enable voice search of the Internet by sending commands over the airwaves, this client-side application gives users the power of voice interaction with their personal content and eliminates network latency. Fonix iSpeak™ connects the user by just saying the phone number or by saying the name of a person in the contacts database. Additionally, users will be able to navigate their music libraries and launch a song or playlist simply by saying the name of the artist, song, or playlist.


To summarize, we are facing a proliferation of speech recognition applications on the iPhone.
A Key for the evolution of speech recognition on the iPhone is the 3G capability (which provides a fast channel to server side conmputing) and the platform openness - both released recently. As this two criteria are fulfilled, we should expect a quick growth in speech recognition applications availability for iPhone.