Monday, March 24, 2008

Macintosh speech recognition

Seems like ages from the Macintosh days. Great to find this speech recognition ad that is still appealing to us. After more than 10 years not much progress is such applications.

Are there signs for a similar application that is really working?!

Saturday, March 22, 2008

$100M to SpinVox

Congratulations to SpinVox for their $100M round. This announcement together with a very effective marketing campaign brought SpinVox to the news on many channels. Many people asked me about this company - is the technology finally mature to hand speaker independent speech recognition task in a noisy environment and highly compressed channels. Especially people are interested in speech analytics once mentioned together with the magic words facebook or twitter.

Various companies are reviving the speech recognition market with new applications and in some cases confuse laymen whether it is a new technology or a new application. It reminds me of one of my favorite Dilbert cartoons I am attaching.

In recent posts at TechCrunch, GigaOM and many others, SpinVox is mentioned together with a reference to other companies active in the voicemail speech to text market: SpinVox, GotVoice
Simulscribe, Jott, Yap, Vlingo.

In future post, I will focus on the technological difference between these companies (as one can understand from their marketing material). For now, I will just comment that $100M round is not standard for a technology company. However, for a service company it is. And that is actually the key to understand the SpinVox operation. Speaker independent transcription is not 100% and is not even 90%. When taking into account noisy channels and compressed telephony the numbers are reduced much further. While providing a service for voicemail transcription, humans are required in the process for verification. A verification process requires listing to the message which can be performed at at most 1.3 times faster the message length. This is very similar to the time it takes for a trained person to transcribe the call so the technology cannot help much. So what is the technological gap - it is mainly to ensure that the transcribers will not be trained much as usually is needed for low wages high turnaround operations.

All people that are involved at speech analytics believe that this voice thing is gonna be big. Yet, we should be realistic about what can be achieved and when.

Tuesday, March 18, 2008

SpinVox Targets Cambridge for Speech Recognition Skills

Good news for SpinVox - Do you think that voicemail to text will eventually work without human intervention?

Dr. Tony Robinson world-renowned academic and entrepreneur - joins to build new centre of expertise

LONDON & ATLANTA---SpinVox, the founder and global leader in Voice-to-Screen messaging, has announced that Dr. Tony Robinson has joined the company as director of its Advanced Speech Group (ASG).

Robinson previously provided SpinVox with expert advice on its speech technology strategy along with Phil Woodland, a Professor of Information Engineering at Cambridge Universitys Machine Intelligence Laboratory and coordinator of the Speech Research Group. Woodland retains his role as a consultant to SpinVox.

SpinVox will relocate some of its existing team of Automated Speech Recognition (ASR) experts from its world headquarters in Marlow, Bucks to its new ASG centre in Cambridge.

Robinsons remit will be to further build a world-class team from the ASR expertise that is concentrated in the Cambridge area. Under his leadership, the SpinVox ASG will further develop the Voice Message Conversion System that is at the heart of SpinVox services.

Robinson is an established, internationally-known academic, originally working at the University of Cambridge, who has successfully made the transition to entrepreneurship. He was formerly founder and owner of Cantab Research Ltd, and CTO of Zentian Ltd, both of which are involved in high accuracy real-time speech recognition in challenging environments.

Tony is one of only a handful of people who have a complete academic and commercial understanding of all aspects of speech recognition," says Christina Domecq, SpinVox co-founder and CEO. Were delighted that Tony is joining us to help take us to the next stage of our growth worldwide.

Cambridge and the HTK

Cambridges global pre-eminence in speech recognition - which has resulted in a cluster of hundreds of voice specialists based at the University and in the specialist companies housed in the science parks and campuses that surround the institution - is based, in part, on the development at the University of the Hidden Markov Model Toolkit (HTK), the worldwide standard software for building speech recognition systems.

HTK, which was acquired by Microsoft in November 1999 as part of its acquisition of Entropic Inc., is now available free of charge to developers. Cambridge University is responsible for maintaining and developing the HTK.

SpinVox VMCS

"HTK Model technology is one of the foundations of SpinVox VMCS, adds Daniel Doulton, SpinVox co-founder and chief strategy officer. Indeed it is at the heart of modern speech recognition systems. At a time when everyone, from industry giants such as Cisco, IBM and Microsoft to specialists such as Nuance, are focussing on voice, Cambridge is recognised as the centre of the speech recognition universe and thats why SpinVox is setting up there.

SpinVox is the Google of speech it has successfully cornered the market for voice conversion services and the accumulated resources it has assembled represents a huge opportunity for ambitious speech developers and researchers to build their careers, emphasises Robinson. The companys VMCS voice message conversion system is the most advanced of its kind and I believe we have seen only the beginning of its huge potential.

VMCS works by combining state-of-the-art speech technologies with human intelligence and learning. A fully automated system, it `knows what it doesn't know` and is able to call for assistance when required. VMCS is continually evolving and currently converts messages in English, French, Spanish and German.

Cambridge Connectionist Speech Group

After completing his PhD, and following his subsequent appointment as SERC Advanced Research Fellow, Robinson built the Cambridge Connectionist Speech Group in Cambridge University. In the 1990s the group participated in projects including the TREC Spoken Document retrieval tracks and the DARPA speech recognition evaluations before, in 1995, Robinson was appointed a lecturer in the Department of Engineering and simultaneously founded SoftSound.

In its first five years, SoftSound achieved the first deployment of automatic subtitle generation - on BBCs `Eastenders`. From 1997 to 2000 SoftSound was a key partner in the EU-funded THISL project which created the first audio indexing and retrieval system based on large vocabulary speech recognition.

In May 2000, Autonomy invested in SoftSound which provided access to worldwide markets and resulted in rapid expansion. Central to SoftSound's success is a patented algorithm for speech recognition which allows faster operation with less memory usage.

The author of over 100 academic papers and holder of three patents, Robinson has competed in marathons in London, New York, Paris, Amsterdam, Inverness, Snowdonia and in the Cambridge area. He is also a keen fell runner, having conquered Snowdon and other Welsh and Scottish peaks, and raced across the North Yorks Moors.

To find out more about SpinVox go to

About SpinVox

SpinVox® brought together the two most popular methods of communication voice and text and created a new category of messaging called Voice-to-Screen. Its award-winning service is now making everyday communication simpler and more powerful, creating new recurring revenues for wireless, landline, cable and VOIP carriers as well as service providers and web partners. SpinVox has already launched its service with Alltel, Cincinnati Bell, Rogers Wireless, Sasktel, Telstra, Telus, Vodafone Spain, Vodacom South Africa and Six Apart and announced a deal with Skype. As a managed service provider any network or service can rapidly and cost-effectively implement SpinVox.

At the heart of SpinVox is its Voice Message Conversion System (VMCS), which works by combining state-of-the-art speech technologies with a live-learning language process. VMCS is being rolled-out across four continents in four languages - English, French, Spanish and German.

Monday, March 17, 2008


Speech recognition without speech.

By picking up nerve signals, Audeo understands 150 words and phrases.

Can it be used for real applications ?

Hawthorne Videoactive Report


Speech analytics is growing constantly enabling various applications and at the same time pushing forward the basic technology . This blog is an attempt to create interaction between speech analytics professionals around the globe and provide an open platform to promote new products, technologies, conferences and ideas. Any member of the speech analytics group and linkedin can add posts to this blog.

It will take sometime to bootstrap this blog and bring this to the attention of the professional gang. Be patient and promote your view/needs.