Sunday, March 15, 2009

Google Voice - the new threat to the reputation of speech recognition system

Google posted last week the new enhancements for GrandCentral and actually its evolution to Google Voice. It is in general an application to better manage your voice communications.

The new application improves the way you use your phone. You can get transcripts of your voicemail (see the video below) and archive and search all of the SMS text messages you send and receive. You can also use the service to make low-priced international calls and easily access Goog-411 directory assistance. It is an addition to the GrandCentral standard features
including a single number to ring your home, work, and mobile phones, a central voicemail inbox that you could access on the web, and the ability to screen calls by listening in live as callers leave a voicemail. You'll find these features, and more, in the Google Voice preview. Check out the features page for videos and more information on how these features work. It is great to have visual voicemail and this will enahance further the iPhone and simialr smartphones.




Google is taking its voice recognition into prime time and into a very delicate position as it exposes all the voice transcripts load and clear with no option for human correction. I can only guess what will happen to voicemails in foreign language, heavy accent etc. We've all seen the mistakes in the Google speech recognition system used on for mining presidential campaign in the US and trust me, most of us do not speak as clear as Obama.

So as a user, there are several options for getting a transcription to your voice mail. SpinVox, Nuance and all other small players (SimulScribe, Jott etc.) who rely on human review and Google who is claiming that there maybe errors but its the user decision whether to rely on the provided transcripts.

I anticipate that the frustration from the quality of Google Voice transcription is going to be a source for bad attitude towards speech recognition technology (see the Dilbert cartoon I put in this blog in the past). While there were mechanisms to "hide" the embracing mistakes in other systems, here all the transcript will be visible and may relate to mission critical information.

My recommendation to Google is to open an interface for providing a manual transcripts. Either this can be connected to the SpinVox API (announced recently) or alternatively open an interface that allows sending a proposed transcript + Audio and receiving back a corrected transcript. This can open a new market for people who will be willing to send their voice communication to remote secretaries (probably in India) who will transcribe the audio and return it to Google Voice. This can also be a perfect fit to Amazon’s Mechanical Turk service where you can get people to perform simple tasks. If you want further info, contact me directly about the way it should be constructed including adaptive language id etc.

2 comments:

Anonymous said...

I was exсited to uncover thiѕ web sitе.
I ωanted to thank уou foг your tіme ϳust for this fantaѕtіc гead!

! I definitely savoreԁ eveгy ρart of it аnԁ Ι hаve you saved as а faѵorite to look
at new stuff іn your ωeb ѕite.

Alsο viѕіt mу blοg pоst :: Premature Ejaculation Pills

Anonymous said...

Hi, just wanted to tell you, I loved this blog post.
It was inspiring. Keep on posting!

Here is my homepage - cheap legal highs