Sunday, September 20, 2009

Nuance and Lawsuits - again

In addition the the IBM engine which has strong patent behind it, Vlingo is leveraging new relations with AT&T as an investor and as a speech engine (Watson) provider in prep for the Nuance lawsuit. So now the IBM engine is licensed to Vlingo via Nuance (and this is why theuy are suing them). Moreover, Vlingo is not using this engine anymore. It seems like lawyers tricks rather than technology.

AT&T Backs Vlingo as Nuance Lawsuit Looms

AT&T has taken a minority stake in Vlingo in a move that could have major repercussions for Nuance’s patent infringement suit against the voice navigation startup. As part of the deal, Vlingo will integrate its offerings with AT&T’s Watson, a core speech recognition technology that serves as a foundation for voice-activated products. Vlingo will all but abandon the IBM-developed technology that it had been using — and which is at the heart of Nuance’s lawsuit.

“Our goal is to move everything to AT&T,” Vlingo CEO Dave Grannan told me this morning. “If Nuance decides to proceed, they’ll essentially be suing us for violating patents — and this is the crazy thing — and the alleged violation occurs in the IBM engine Nuance licensed to us and, by the way, we don’t use anymore.”

Nuance executives were not immediately available for comment.

Grannan said the Watson engine is “superior” to the IBM-developed technology and will help Vlingo build better voice navigation offerings. AT&T and Vlingo will begin rolling out new products later this year for AT&T customers and plan to market the joint solution to other industry players, including device manufacturers and other carriers.

The cutthroat nature of the speech recognition space underscores the increasing attention it’s attracting from investors, as voice is positioned as a superior navigation tool to device keypads and touchscreens -– especially for users behind the wheel. A Boston-area outfit, Vlingo began to gain traction in early 2008 when Yahoo tapped it to add a speech recognition component to its oneSearch offering. The agreement –- which followed Vlingo’s $6.5 million Series A round in 2007 -– saw Yahoo lead a second round of funding that brought in $20 million.

But Vlingo isn’t the only startup getting funded; the space has attracted hundreds of millions of dollars, and competition is fierce. Microsoft Corp. joined the field two years ago with its $800 million acquisition of Tellme, whose technology it’s using to incorporate speech into its upcoming Windows Mobile 6.5. Google has invested heavily as well, and is deploying voice navigation with its Android platform. As for the voice recognition companies themselves, outside of Vlingo, smaller players such as V-Enable and Promptu are also vying for attention. Meanwhile Nuance, the dominant pure play on the field, has spent $1 billion or so in acquisitions.

- By Colin Gibbs | Gigaom

Tuesday, July 14, 2009

Nuance acquires Jott (without suing them first :-)

Aiming to improve at the human based transcription, Nuance just acquired Jott. It is not the technology, so why? Probably minimal price for Jott assets (not disclosed) and some methodology and interfaces for some external applications such as
If you know more, share with us this info.

Nuance Acquires Jott, Expands Mobile Portfolio

Innovative Jott Service to Deliver Powerful New Voice-to-Text Capabilities to Mobile Operators and En

BURLINGTON, Mass. & SEATTLE--(BUSINESS WIRE)--Nuance Communications, Inc. (NASDAQ: NUAN - News) today announced it has expanded its Mobile Division voice services portfolio with the acquisition of Jott. Jott is the innovator behind the popular Jott Assistant, the simple and easy-to-use service that enables users to create notes, set reminders and appointments, send email and text messages, and post to their favorite web services – all by voice, from any device.

“Jott’s voice-to-text offerings have experienced a groundswell of adoption and positive industry recognition since the company’s inception, and we’re thrilled about the opportunity to expand our market reach and our voice services portfolio,” said Michael Thompson, senior vice president and general manager, Nuance Mobile. “Together we will deliver a range of new services to our mobile operator and enterprise customers.”

The combined Nuance and Jott teams will focus on several key voice-to-text initiatives:

  • The innovative Jott Assistant service has been adopted by hundreds of thousands of users, providing vast insight into the demands of today’s mobile users. To further extend the power of Jott across the mobile mass market, Nuance plans to package and offer Jott Assistant to mobile operators as part of its voice services portfolio, including Nuance Voicemail-to-Text.
  • By combining its easy-to-use voice services with email, text messaging and a variety of web services, Jott’s service has advanced mobile productivity in the enterprise market. Nuance, together with its Enterprise Unified Communications partners, will offer a secure, highly scalable and differentiated Enterprise package including Nuance Voicemail-to-Text, Messaging, and Collaboration.
  • As used in Jott for Salesforce, Jott provides open APIs that allow for voice integration with third party CRM providers and other critical enterprise applications that require mobile access. Nuance will continue to mainstream and expand the CRM partner program through its existing CRM partnerships, enabling enterprise application providers to better meet the needs of the evolving professional mobile market.

“We’ve seen dramatically increased demand for our mobile voice solutions, because they offer real business value, are easy to deploy, and are a delight to use,” said John Pollard, co-founder, Jott. “Nuance has consistently delivered groundbreaking mobile applications to billions of people worldwide. Our combined expertise will bring innovative and differentiated voice services to a variety of markets with tremendous scale.”

All of Jott’s services, including Jott Assistant, Jott Voicemail and Jott for Salesforce, will remain available, and existing customers will experience no interruptions in service. For more information and to access Jott services, visit

About Nuance Communications, Inc.

Nuance is a leading provider of speech and imaging solutions for businesses and consumers around the world. Its technologies, applications and services make the user experience more compelling by transforming the way people interact with information and how they create, share and use documents. Every day, millions of users and thousands of businesses experience Nuance’s proven applications and professional services. For more information, please visit:

About Jott

Headquartered in Seattle, WA, Jott Networks is the world leader in mobile voice-to-text applications. Jott allows individuals and businesses to easily capture thoughts, send emails and text messages, set reminders, organize lists, and post to web services and business applications – all with their voice, using any phone. Jott also converts voicemail into email and text messages, making voicemail a more productive tool. Since being founded in 2006 by John Pollard and Shree Madhavapeddi, Jott has made world-class voice transcription accessible to anyone with a cell phone. For more information on the Jott service, visit

Friday, July 3, 2009

Google voice mail transcription quality - can you trust it?

Recently David F. Gallagher wrote in the NY times about Google voice. David asked people to leave him messages and he posted the automatic transcriptions. While performing good in many cases, the results are far from accurate and in many cases one cannot understand the gist of the message.
See information at pushing the limits of googles speech recognition and messing with googles speech recognition part 2.

To some extent everybody knows it is experimental and do not really trust it. Is it useful? Probably just for fun and for people who do not want any human to be involved in their private message transcription like SpinVox, Nuance or other such vendors.

Saturday, May 23, 2009

“Mainstream” Speech Analytics

Recently Verint announced a new offering for the "mainstream" market. Many people do not understand the importance of this announcement.

For years we are pushing speech anlaytics to the market and the adoption cycle is pretty long and thus maybe expensive. It requires experts to learn the system and get th best out of it. Eventually there are many success stories for speech analytics. However for a technology to become widely use there is a need for additional simplicity - make sure it can be operated by laymen, work out of the box and provide benefits without much learning. This is exactly the step that Verint is taking with the new system and I bet it will overachieve any other offering in the market.

If you consider deploying a speech analytics solution, even for a small contact center - this should be your first alternative.

 Verint Systems Inc.

Verint Witness Actionable Solutions Takes Speech Analytics “Mainstream” with New Impact 360 Speech Analytics Solution

Breakthrough Technology Enables Cost-Effective Deployment of Speech Analytics, Helping Companies React Rapidly to Changes in Customer Behavior, Reduce Expenses and Enhance the Customer Experience

New Speech Analytics “Essentials” Solution Enables Small and Medium-Sized Contact Centers to Attain Speech Analytics Benefits Previously Out of Reach

Driving Innovation 2009
Verint Witness Actionable Solutions’ 13th Annual Global User Conference
Loews Lake Las Vegas Resort, Nevada

LAS VEGAS & MELVILLE, N.Y.--(BUSINESS WIRE)--Verint® Systems Inc. today announced the availability of its new Impact 360® Speech Analytics solutions. Simple and cost effective to deploy, the software from Verint® Witness Actionable Solutions® is bringing speech analytics technology into mainstream contact center operations without costly setup overhead, lengthy consulting engagements and the need for interpretation by separate analyst staff.

The new Impact 360 Speech Analytics enables businesses to mine recorded customer interactions to surface the intelligence essential for building effective cost containment and customer service strategies. Designed to provide rapid insight into changes in customer behaviors and patterns, the solution can deliver value right out-of-the-box by helping remove guesswork from the customer service equation. Impact 360 Speech Analytics proactively identifies call drivers — along with the related product, process and service issues that often originate in areas outside the contact center, such as back-office functions — emerging trends, opportunities and competitive influences, and can then make that business intelligence available enterprise-wide.

The newly introduced Impact 360 Speech Analytics Essentials solution enables businesses with small to medium-sized contact centers to cost-effectively achieve these speech analytics benefits that were previously out of reach.

“We believe this is a technological breakthrough that has the potential to create a new wave of adoption for speech analytics, breaking down previous barriers to entry, such as set-up and configuration,” explains Nancy Treaster, senior vice president and general manager, Verint Witness Actionable Solutions. “We know that the contact center holds a wealth of valuable business intelligence, and speech analytics is an automated, powerful solution that can help companies not only get to that information, but then determine what actions to take. Impact 360 Speech Analytics Essentials allows organizations with small and medium-sized centers to analyze captured conversations and rapidly benefit from valuable insight.”

Robust, New Functionality Extends Verint Leadership in Analytics

The patent-protected Impact 360 Speech Analytics Essentials, and the Impact 360 Advanced Speech Analytics solution, leverage both phonetics and LVCSR recognition — the best combination of both speed and accuracy — adding meaning and context to every word and phrase identified in every call processed without predefinition of terms.

Key, patent-pending functionality featured in Impact 360 Speech Analytics Essentials, as well as Verint’s Impact 360 Advanced Speech Analytics solution, is driven by the company’s proprietary Complete Semantic Index™ technology that features such functionality as:

  • Automated Trend Analysis, Surfacing Changes in Customer Behaviors

Using hundreds of thousands of term and phrase combinations, the new Complete Semantic Index automatically identifies significant changes in customer behavior as expressed within recorded customer interactions. Such changes are proactively surfaced by the software’s Automated Trend Analysis, which identifies increases or decreases in terms and phrases used during customer/agent conversations. Delivered daily, trend reports can be tracked for up to 18 months.

  • Guided Search and Context Visualization

With the Complete Semantic Index, users do not need to know in advance what terms to search. Intuitive search engine-like, guided search capabilities — including contextual suggestions and search visualization functionality — help business users find relevant calls quickly to determine the underlying causes of rising call volumes, costs and customer dissatisfaction.

  • Analytics-Driven Unification with Workforce Optimization Suite Via Native Integrations

The Impact 360 Speech Analytics solutions, part of Verint’s patent-protected, unified Impact 360 Workforce Optimization suite, can use the content of captured customer interactions to route contacts of interest to users throughout the enterprise, such as quality supervisors, marketing managers and customer retention teams.

“We’re excited to introduce this solution, as Verint is fulfilling a critical market need in enabling businesses with small and medium-sized centers to reap the same type of intelligence that their larger counterparts have used for years — all at an attractive price point, and at a time in our economy in which cost containment, retention and the customer experience are paramount,” adds Treaster.

Specifications and Availability

Optimally sized for centers with 50 to 300 agent seats, Impact 360 Speech Analytics Essentials can operate on a single box. As businesses’ needs change and grow, the solution can be easily upgraded to Verint’s advanced speech offering designed for larger centers, scaling up to 50,000 seats.

In addition to the new Complete Semantic Index, Automated Trend Analysis and Guided Search functionality, the Impact 360 Advanced Speech Analytics solution for larger center operations includes Automated Root Cause Analysis with patent-pending TellMeWhy™ functionality. With this capability, the solution can identify potential underlying drivers of specific call types, such as customer complaints or long calls, prioritize the top five root cause groups and automatically suggest the top instigators for each call set.

The Impact 360 Advanced Speech Analytics solution also features Speech Analytics-driven Scorecards designed to help centers better manage performance by balancing cost drivers with customer satisfaction drivers. Peer-based agent comparisons factor in key performance indicators (KPIs), such as customer complaints and repeat call drivers, that are identified from speech analytics results, along with proactive alerts on defined thresholds. Other functionality includes native business integrations with Impact 360 Quality Monitoring’s Smart Inbox— which delivers recorded interactions directly to the desktop based on defined criteria that can include speech analytics categories — and Impact 360 Data Analytics, which analyzes call attributes to help uncover contact scenarios that can positively or negatively impact organization and/or individual KPI achievements.

About Verint Witness Actionable Solutions

Verint® Witness Actionable Solutions® is the leader in analytics-driven workforce optimization software and services. Its solutions are designed to help organizations capture customer intelligence, uncover business trends, discover the root cause of employee and customer behavior, and optimize the customer experience. From contact centers to remote office, branch and back-office operations, its award-winning, next-generation Impact 360® Workforce Optimization suite is the industry’s most unified solution set — featuring quality monitoring and recording, workforce management, speech analytics, data analytics, customer feedback surveys, performance management, eLearning and coaching. Impact 360 helps improve the entire customer service delivery network, powering the right decisions to help ensure service excellence and transform organizations into customer-centric enterprises.

About Verint Systems Inc.

Verint Systems Inc. (VRNT.PK), headquartered in Melville, New York, is a leading provider of Actionable Intelligence® solutions for an optimized enterprise and a safer world. Today, more than 10,000 organizations in over 150 countries rely on Verint solutions to perform more effectively, build competitive advantage and enhance the security of people, facilities and infrastructure. Visit us at our website

This press release contains "forward-looking statements" within the meaning of the Private Securities Litigation Reform Act of 1995, including statements regarding expectations, predictions, views, opportunities, plans, strategies, beliefs, and statements of similar effect relating to Verint Systems Inc. These forward-looking statements are not guarantees of future performance and they are based on management's expectations that involve a number of risks and uncertainties, any of which could cause actual results to differ materially from those expressed in or implied by the forward-looking statements. For a detailed discussion of these risk factors, see the Company's Current Report on Form 8-K filed with the Securities and Exchange Commission on September 10, 2007, as supplemented by our Current Reports on Form 8-K filed on November 5, 2007, January 16, 2008, and April 9, 2008 and the Form NT-10Q filed on April 16, 2009. The forward-looking statements contained in this press release are made as of the date of this press release and, except as required by law, the Company assumes no obligation to update or revise them or to provide reasons why actual results may differ.


Saturday, April 11, 2009

Google's 'Voice Search' comes to India

Nice post on siliconindia:
I specifically liked the example to search for whether :) - must be phonetic.. Anyway given the heavy accents of the south and north and various sub languages, there is a lot of work for google india for just collecting all this data.

Google's 'Voice Search' comes to India
By siliconindia news bureau
Friday,10 April 2009, 15:09 hrs

Bangalore: Now search is just a call. Google has set path for a new era in the speech analytics with 'voice search', which it recently introduced in Hyderabad, Delhi, Mumbai and Bangalore. One just needs to speak a single word into the phone, such as 'whether' or 'hotel' and one can get the top results.

The 'voice search' uses a combination of automated voice recognition engine and operators to provide this facility. To make the service faster and better, Google is also experimenting with voice recognition technology, which will ensure 24-hour support. Currently, the automated system offers results in English, but the operator-driven system offers results in only Hindi and Telegu.

"This is in line with our mission of making information universally useful and accessible, be it at home or on the go," explains Hugo Barra, Group Product Manager, Google Mobile. Not all those who make queries, though, will get accurate results, since the project is still in its pilot stage, reported Business Standard.

Google's logic is a simple one. Mobiles outnumber personal computers (PCs) in India. Besides, just about 5-7 percent of the population has an internet connection, including those who use surf the net via their mobiles. "Voice enables India to reach non-web users in local languages even as our core strength is search," Barra adds.

In the U.S., Western Europe and Japan, the 'voice search' feature is available under the Google Mobile App for the iPhone. It is also available on the Android-based T-Mobile G1, and was introduced on the BlackBerry as a free download last month.

In India, though, Google plans to extend the technology to other cities once it is confident in the quality of its speech recognition technology "in any region of the country", since the number of languages and accents in India are very diverse and distinct from each other.

The company is currently not making any revenue on this service in India though it monetises this in the U.S., Western Europe and Japan through ads which appear when the results show up. It does not charge the user for information received or for connecting them to businesses. The local business information used by Google is the same as that on local search. Data is continuously being added, and Google is collecting feedback from the users, Barra explained.

Talking about the company's mobile strategy, Vinay Goel, Country Head of Products, Google India, said, "There has been a significant increase of mobile search users in 2009. We believe that users graduate from plain voice search to an SMS-based one and finally to internet-based search, which is our goal."

The search giant has also made Orkut (its social networking site) and Google Maps available on mobiles. Besides, it also has the Google Latitude (to help locate one's friends) feature as an opt-in download.

Tuesday, March 31, 2009

Google Voice should hire Spinvox for message transcription

I just saw this post at:

I guess it is the laymen feedback on the google voice service and exactly what I anticipated at:
Google Voice - the new threat to the reputation of speech recognition system.

Google Voice should hire Spinvox for message transcription

March 30th, 2009

spinvox_logo_white.gifI have been using GrandCentral for years and when they announced Google Voice I was elated. I could not wait for the new features including my absolute favorites being SMS and voice mail being transcribed to text and emailed and texted to you.

Well the SMS is great but the voice transcription is dreadful. I didn't call someone back last week mistakenly thinking it was a telemarketer. It was Bena Roberts of GoMoNews fame and a very important mover and shaker in the business. This kind of transcription mistake is not acceptable. I have a suggestion for Google Voice that I really hope they listen to, OUTSOURCE TO SPINVOX. Now I don't work for SpinVox, but I've used it and on dozens of friends very high regard and reference for the SpinVox service I'm willing to give this advice.

Google don't allow Google Voice to tarnish a perfectly fantastic service because you're not willing to spend a few dollars. Go test out SpinVox and if it beats your transcription (I guarantee it will) then please switch over.

TelStrat - TelStrat’s Engage™ Suite Now Understands Speech Analytics

Engage Analyze expands TelStrat's contact center product portfolio with true phonetic speech analytics intelligence

Orlando, Florida – March 30, 2009 – TelStrat, a global supplier of comprehensive contact center solutions, business call recording products, and leading-edge access network systems, today chose VoiceCon Orlando 2009 as the venue to announce Engage Analyze, the latest addition to its industry-leading contact center solution suite. Engage Analyze provides advanced speech analytics that equip organizations to transform voice calls into knowledge that can help them improve efficiency, increase compliance, and gain the competitive advantage.

Engage Analyze indexes and audio mines words and phrases buried in calls using a patented Phonetic Audio Search and Recognition Engine. Unlike older, less efficient speech-to-text approaches, phonetic speech search is not dependent on finite dictionary and grammar models which require constant maintenance. This makes it easy to accurately search for new competitors, product names, slang, and other dynamically changing terms.

Phonetic search technology also makes Engage Analyze fast – much faster than speech-to-text systems. Pre-processing or indexing of content is typically 60-80 times faster than real time, more than an order of magnitude faster than Large Vocabulary Conversational Speech Recognition (LVCSR) speech-to-text systems. Subsequent searches for words or phrases are incredibly quick, averaging over 30,000 times faster than real time and reaching rates up to 80,000 times faster.

Speech-to-text systems rely on limited, statistical sampling of calls, typically 3-5% of call volume, due to cost and complexity. The technology in Engage Analyze makes it possible to audio mine up to 100% of calls, in real time if desired. The product does all this without the massive computing power necessary for comparable LVCSR systems. With Engage Analyze, contact centers can now accurately analyze and recognize trends over thousands of hours of customer calls.

"With Engage Analyze we're bringing our customers advanced technology that provides them with a powerful tool to enhance business and customer intelligence, "said TelStrat President Kevin Smith. "The search speed and recognition capabilities make this product a market leader, and we've made it affordable for organizations of virtually any size."

Engage Analyze is the newest component of TelStrat's Engage Contact Center Suite. Engage Suite blends full-featured voice and screen recording; intuitive agent performance evaluation, tracking and coaching; powerful agent scripting and call automation; sophisticated workforce forecasting and scheduling; and now, advanced speech analytics. It addresses each major aspect of contact center operations. Designed to benefit any 'center of contact', Engage Suite is ideal whether used by a large telemarketing firm or a small company's support staff.

Monday, March 30, 2009 speech recognition integration by Jott

Jott recently announced a speech interface to By leveraging their speech to text services, they came with an offering to enterprise users. It is another example how a speech to text API (human based on automatic) can be easily integrated into other saas applications. See also SpinVox Open API. A key around this service will be what happen when the data is inaccurate due to transcription error (weather human or machine error). What is the feedback that users will recieve from the system and what is the fix process for such error. I guess the Jott people can highlight some of that but so can similar service providers like SpinVox, Nuance, Google etc.

Jott Networks Introduces Jott for Salesforce, Makes Mobile CRM Input Insanely Simple With Voice-to-text

SEATTLE, March 19 /PRNewswire/ -- Jott Networks today announced the addition of Jott for Salesforce to its expanding line of mobile productivity services. The new service uses Jott's high-quality voice-to-text technology, and allows sales professionals to make a simple call on any phone to directly input opportunity updates, take quick notes, and set reminders and appointments - all hands-free. Jott is offering a one-month free trial of Jott for Salesforce and it comes with a free subscription to Jott Assistant Pro, Jott's widely acclaimed mobile productivity tool.

Jott CEO and co-founder John Pollard said, "There are already over a thousand businesses that use Jott's other services to get more done on the go. These same businesses wanted us to provide integration with more critical business applications and Jott for Salesforce is the first in that line." He added, "Sales professionals can now use our best-in-class voice-to-text technology to avoid the hassle of cramped and clumsy mobile interfaces and frustratingly slow connection speeds. Sales teams spend more time selling and less time typing reports, and sales managers see greater adoption of Salesforce and receive better, fresher forecast data." Jott for Salesforce includes the following features:

Features for Sales Professionals

  • Update opportunities and accounts - Use your voice to quickly update entire opportunities with a simple flow or individual forecast fields with shortcuts. Data ends up in specific accounts with no cutting, pasting or forwarding required.
  • Take quick notes - While they are still fresh, speak quick notes about accounts and opportunities and add tasks to your Salesforce dashboard.
  • Schedule appointments and set reminders - Use your voice to book a meeting on your or Outlook calendar, and set reminders so you never forget.
  • Get confirmation of everything - Every update you leave comes with a confirmation email/text message so you know for certain that data was entered into your accounts.

Features for Managers

  • Set up in minutes - Jott for Salesforce is incredibly easy to set up. There are no desktop or phone downloads, and it requires no changes to your existing Salesforce set-up.
  • Scale easily - Jott for Salesforce was built for scale. With nothing to download or maintain, and no new equipment to buy, it easily accommodates individuals or organization-wide rollouts.
  • No training necessary - While training is available to help teams get the most out of Jott for Salesforce, only a few simple commands are needed to get started.

Pricing and Availability

Jott for Salesforce is available today from the App Exchange on and from the web site. It takes just a few minutes to set up, and is compatible with all carriers and all mobile phones in the US and Canada. Jott for Salesforce's pricing is straightforward and affordable at $25 per user per month. For that fee, users can send unlimited updates into Salesforce with no need to worry about overage charges.

For more information on Jott Salesforce and other Jott services, please visit

Sunday, March 29, 2009

Visual Speech Recognition: Lip Segmentation and Mapping

I just saw this free ebook which maybe of interest for the speech analytics community. See the link below.


Description: The unique research area of audio-visual speech recognition has attracted much interest in recent years as visual information about lip dynamics has been shown to improve the performance of automatic speech recognition systems, especially in noisy environments.

Visual Speech Recognition: Lip Segmentation and Mapping presents an up-to-date account of research done in the areas of lip segmentation, visual speech recognition, and speaker identification and verification. A useful reference for researchers working in this field, this book contains the latest research results from renowned experts with in-depth discussion on topics such as visual speaker authentication, lip modeling, and systematic evaluation of lip features.

Friday, March 20, 2009

SpinVox Open API

Congrats to SpinVox for this important move. Opening your system to others can drive speech applications faster leveraging many more developers and creative minds. Whether it is human transcription or machine transcription, this move separates the speech processing part from the application part and push for SaaS speech enabled applications.


SpinVox to Demonstrate Open API Applications at CTIA Wireless 2009

Pre-Registrations For SpinVox Create Fuel Co-Development Program and Confirm Demand for Voice Conversion Services from Web-Based Technology Developers Wanting to Build Speech 2.0 Applications

CTIA Wireless 2009

LONDON & NEW YORK--(BUSINESS WIRE)--SpinVox, the global leader in voice to content messaging, will showcase three brand new Speech 2.0 applications at CTIA Wireless 2009, to be held April 1-3 in Las Vegas. The applications have been developed in less than a month to demonstrate the power of SpinVox Create, an open API (Application Programming Interface) to the SpinVox Voice Message Conversion System™ (VMCS), the world’s largest commercial speech platform.

SpinVox Create was announced at Mobile World Congress, Barcelona in February 2009 and developers were invited to pre-register their interest in SpinVox Create in advance of its launch via a web registration page -

SpinVox Create will be launched as a key part of a two-stage corporate API strategy that will also be announced at CTIA and rolled out by SpinVox in the first half of 2009.

Nearly 100 developers have already registered interest in SpinVox Create and, of these, 20 have been selected by SpinVox to be part of the co-development program. Those selected include business efficiency, personal productivity, games and social networking applications.

SpinVox Create is a simple, straightforward API that leverages SpinVox’s commercial speech platform – which is growing quickly with more than 30 million users - to enable any developer with Web access to quickly build commercial speech applications. It also enables SpinVox to collaborate with third parties to expand the Speech 2.0 market and foster further innovations in voice that complement SpinVox’s existing platform development services for Enterprise application partners and Carrier networks.

“We’ve been impressed by both the quality and quantity of responses to our pre-registration announcement,” says SpinVox co-founder and CEO Christina Domecq. “We are clearly seeing an increased demand for voice conversion services from technology developers who recognize that a speech interface enables the most natural form of communication, and who want to build Speech 2.0 applications with best-in-class open products.”

Demonstration of Voice innovation on Apple, Nokia, and Windows Mobile platforms

Three applications will be demonstrated at CTIA Wireless 2009. These are based on Apple iPhone, Nokia Series 60, and Microsoft Windows Mobile platforms.

`Travel Blog`, a Windows Mobile 6.0 Application developed by Singapore-based Global Idealogy Corporation lets you tag and post your photographs using just your voice . You can select photographs through the application, speak a message, attach the converted text to the photographs, post it on blog, social networking websites or send it as email or MMS.

`Speak-a-Text`, a Nokia Series 60 Application developed by UK-based Symbian Platinum Partner, Savage Minds, incorporates the ability to speak a text which is converted to text and placed into the menu structure of the phone software.

`Memo`, an iPhone Application, developed by UK-based SpinVox allows iPhone users to speak a memo through the iPhone application and after conversion into text by the SpinVox VMCS the memo resides on the iPhone for instant access whenever needed.

Drive the next upturn

SpinVox has already received pre-registrations for SpinVox Create across the globe and looking ahead expects rapid uptake of the API particularly in Silicon Valley where SpinVox Web 2.0 services such as SpinVox Blog and SpinVox Social Networks have been increasingly popular.

Adds Domecq, “SpinVox has created a new category - carrier-grade voice conversion - and now is helping talented developers take advantage of the next growth opportunity in speech. The potential for innovation between carriers and the web is enormous – along with our own innovations we're now delivering a platform for creation of market changing applications and supporting their transformation to carrier-grade services. Speech 2.0 applications will be one of a cluster of innovations that will drive the next upturn as people are increasingly enabled to re-discover the power of their voice.”

About SpinVox

SpinVox® is the world's largest privately-held speech technology company, providing the only voice to text messaging services which are used daily by millions of people and whose user base has grown over twenty-fold in the last 12 months.

Through significant innovations in voice and network technologies which are protected by over 40 patents worldwide, SpinVox has converged the two most natural forms of communication - voice and text - to create the fastest-growing form of messaging: Voice-to-Content™.

SpinVox services are available directly on and through leading carriers and through new media, Unified Communications and other service providers globally.

Implemented as a carrier-class cloud service, SpinVox is proven to be able to easily create value from everyday user behavior using voice and deliver rapid and easy implementation of low input, sustained high reward services.

At the heart of SpinVox is its ground-breaking Voice Message Conversion System™ (VMCS), which works by combining state-of-the-art speech technologies with a live-learning language process. Developed by the Cambridge, UK- based SpinVox Advanced Speech Group; VMCS now serves users across five continents in English, French, Spanish, German, Portuguese and Italian.

SpinVox is now live with Alltel, Cincinnati Bell, Sasktel, Rogers Wireless, Telus, Telstra, Vodacom South Africa, Vodafone Spain, Movistar Chile, Skype and Livejournal.

Wednesday, March 18, 2009

SpinVox response for Google voice - where is the Nuance response?

As per the recent google voice voicemail transcription announcement, SpinVox is reponding -
In a recent post by Rich Tehrani from TMC, he quote a response from SpinVox about Google voice:

Google is entering a marketplace that continues to be led by SpinVox, the world's largest privately-held speech technology company. We're excited by the launch of Google Voice because it will demonstrate the benefits of speech-to-text conversion and validate its deployment as a network service to an increased audience. We have already launched carrier-grade services with 13 operators - including recently with Skype - on five continents and SpinVox is in use by in excess of 30 million users. SpinVox's 97 percent accuracy in conversion is now the benchmark around the world. - Christina Domecq, co-founder and CEO of SpinVox

And this 97% are human results. I am not sure that Google will post any response but we need to wait for usability feedbacks from users.

I further wonder about Nuance's response to Google Voice..?

Sunday, March 15, 2009

Google Voice - the new threat to the reputation of speech recognition system

Google posted last week the new enhancements for GrandCentral and actually its evolution to Google Voice. It is in general an application to better manage your voice communications.

The new application improves the way you use your phone. You can get transcripts of your voicemail (see the video below) and archive and search all of the SMS text messages you send and receive. You can also use the service to make low-priced international calls and easily access Goog-411 directory assistance. It is an addition to the GrandCentral standard features
including a single number to ring your home, work, and mobile phones, a central voicemail inbox that you could access on the web, and the ability to screen calls by listening in live as callers leave a voicemail. You'll find these features, and more, in the Google Voice preview. Check out the features page for videos and more information on how these features work. It is great to have visual voicemail and this will enahance further the iPhone and simialr smartphones.

Google is taking its voice recognition into prime time and into a very delicate position as it exposes all the voice transcripts load and clear with no option for human correction. I can only guess what will happen to voicemails in foreign language, heavy accent etc. We've all seen the mistakes in the Google speech recognition system used on for mining presidential campaign in the US and trust me, most of us do not speak as clear as Obama.

So as a user, there are several options for getting a transcription to your voice mail. SpinVox, Nuance and all other small players (SimulScribe, Jott etc.) who rely on human review and Google who is claiming that there maybe errors but its the user decision whether to rely on the provided transcripts.

I anticipate that the frustration from the quality of Google Voice transcription is going to be a source for bad attitude towards speech recognition technology (see the Dilbert cartoon I put in this blog in the past). While there were mechanisms to "hide" the embracing mistakes in other systems, here all the transcript will be visible and may relate to mission critical information.

My recommendation to Google is to open an interface for providing a manual transcripts. Either this can be connected to the SpinVox API (announced recently) or alternatively open an interface that allows sending a proposed transcript + Audio and receiving back a corrected transcript. This can open a new market for people who will be willing to send their voice communication to remote secretaries (probably in India) who will transcribe the audio and return it to Google Voice. This can also be a perfect fit to Amazon’s Mechanical Turk service where you can get people to perform simple tasks. If you want further info, contact me directly about the way it should be constructed including adaptive language id etc.