Tuesday, March 31, 2009

Google Voice should hire Spinvox for message transcription

I just saw this post at: http://abrilliantblog.com/2009/03/google-voice-should-hire-spinvox-for-message-transcription/

I guess it is the laymen feedback on the google voice service and exactly what I anticipated at:
Google Voice - the new threat to the reputation of speech recognition system.

Google Voice should hire Spinvox for message transcription

March 30th, 2009

spinvox_logo_white.gifI have been using GrandCentral for years and when they announced Google Voice I was elated. I could not wait for the new features including my absolute favorites being SMS and voice mail being transcribed to text and emailed and texted to you.

Well the SMS is great but the voice transcription is dreadful. I didn't call someone back last week mistakenly thinking it was a telemarketer. It was Bena Roberts of GoMoNews fame and a very important mover and shaker in the business. This kind of transcription mistake is not acceptable. I have a suggestion for Google Voice that I really hope they listen to, OUTSOURCE TO SPINVOX. Now I don't work for SpinVox, but I've used it and on dozens of friends very high regard and reference for the SpinVox service I'm willing to give this advice.

Google don't allow Google Voice to tarnish a perfectly fantastic service because you're not willing to spend a few dollars. Go test out SpinVox and if it beats your transcription (I guarantee it will) then please switch over.

TelStrat - TelStrat’s Engage™ Suite Now Understands Speech Analytics

Engage Analyze expands TelStrat's contact center product portfolio with true phonetic speech analytics intelligence

Orlando, Florida – March 30, 2009 – TelStrat, a global supplier of comprehensive contact center solutions, business call recording products, and leading-edge access network systems, today chose VoiceCon Orlando 2009 as the venue to announce Engage Analyze, the latest addition to its industry-leading contact center solution suite. Engage Analyze provides advanced speech analytics that equip organizations to transform voice calls into knowledge that can help them improve efficiency, increase compliance, and gain the competitive advantage.

Engage Analyze indexes and audio mines words and phrases buried in calls using a patented Phonetic Audio Search and Recognition Engine. Unlike older, less efficient speech-to-text approaches, phonetic speech search is not dependent on finite dictionary and grammar models which require constant maintenance. This makes it easy to accurately search for new competitors, product names, slang, and other dynamically changing terms.

Phonetic search technology also makes Engage Analyze fast – much faster than speech-to-text systems. Pre-processing or indexing of content is typically 60-80 times faster than real time, more than an order of magnitude faster than Large Vocabulary Conversational Speech Recognition (LVCSR) speech-to-text systems. Subsequent searches for words or phrases are incredibly quick, averaging over 30,000 times faster than real time and reaching rates up to 80,000 times faster.

Speech-to-text systems rely on limited, statistical sampling of calls, typically 3-5% of call volume, due to cost and complexity. The technology in Engage Analyze makes it possible to audio mine up to 100% of calls, in real time if desired. The product does all this without the massive computing power necessary for comparable LVCSR systems. With Engage Analyze, contact centers can now accurately analyze and recognize trends over thousands of hours of customer calls.

"With Engage Analyze we're bringing our customers advanced technology that provides them with a powerful tool to enhance business and customer intelligence, "said TelStrat President Kevin Smith. "The search speed and recognition capabilities make this product a market leader, and we've made it affordable for organizations of virtually any size."

Engage Analyze is the newest component of TelStrat's Engage Contact Center Suite. Engage Suite blends full-featured voice and screen recording; intuitive agent performance evaluation, tracking and coaching; powerful agent scripting and call automation; sophisticated workforce forecasting and scheduling; and now, advanced speech analytics. It addresses each major aspect of contact center operations. Designed to benefit any 'center of contact', Engage Suite is ideal whether used by a large telemarketing firm or a small company's support staff.


Monday, March 30, 2009

Salesforce.com speech recognition integration by Jott

Jott recently announced a speech interface to salesforce.com. By leveraging their speech to text services, they came with an offering to enterprise users. It is another example how a speech to text API (human based on automatic) can be easily integrated into other saas applications. See also SpinVox Open API. A key around this service will be what happen when the data is inaccurate due to transcription error (weather human or machine error). What is the feedback that users will recieve from the system and what is the fix process for such error. I guess the Jott people can highlight some of that but so can similar service providers like SpinVox, Nuance, Google etc.

Jott Networks Introduces Jott for Salesforce, Makes Mobile CRM Input Insanely Simple With Voice-to-text

SEATTLE, March 19 /PRNewswire/ -- Jott Networks today announced the addition of Jott for Salesforce to its expanding line of mobile productivity services. The new service uses Jott's high-quality voice-to-text technology, and allows sales professionals to make a simple call on any phone to directly input opportunity updates, take quick notes, and set reminders and appointments - all hands-free. Jott is offering a one-month free trial of Jott for Salesforce and it comes with a free subscription to Jott Assistant Pro, Jott's widely acclaimed mobile productivity tool.

Jott CEO and co-founder John Pollard said, "There are already over a thousand businesses that use Jott's other services to get more done on the go. These same businesses wanted us to provide integration with more critical business applications and Jott for Salesforce is the first in that line." He added, "Sales professionals can now use our best-in-class voice-to-text technology to avoid the hassle of cramped and clumsy mobile interfaces and frustratingly slow connection speeds. Sales teams spend more time selling and less time typing reports, and sales managers see greater adoption of Salesforce and receive better, fresher forecast data." Jott for Salesforce includes the following features:

Features for Sales Professionals

  • Update opportunities and accounts - Use your voice to quickly update entire opportunities with a simple flow or individual forecast fields with shortcuts. Data ends up in specific accounts with no cutting, pasting or forwarding required.
  • Take quick notes - While they are still fresh, speak quick notes about accounts and opportunities and add tasks to your Salesforce dashboard.
  • Schedule appointments and set reminders - Use your voice to book a meeting on your Salesforce.com or Outlook calendar, and set reminders so you never forget.
  • Get confirmation of everything - Every update you leave comes with a confirmation email/text message so you know for certain that data was entered into your accounts.

Features for Managers

  • Set up in minutes - Jott for Salesforce is incredibly easy to set up. There are no desktop or phone downloads, and it requires no changes to your existing Salesforce set-up.
  • Scale easily - Jott for Salesforce was built for scale. With nothing to download or maintain, and no new equipment to buy, it easily accommodates individuals or organization-wide rollouts.
  • No training necessary - While training is available to help teams get the most out of Jott for Salesforce, only a few simple commands are needed to get started.

Pricing and Availability

Jott for Salesforce is available today from the App Exchange on Salesforce.com and from the Jott.com web site. It takes just a few minutes to set up, and is compatible with all carriers and all mobile phones in the US and Canada. Jott for Salesforce's pricing is straightforward and affordable at $25 per user per month. For that fee, users can send unlimited updates into Salesforce with no need to worry about overage charges.

For more information on Jott Salesforce and other Jott services, please visit www.Jott.com.

Sunday, March 29, 2009

Visual Speech Recognition: Lip Segmentation and Mapping

I just saw this free ebook which maybe of interest for the speech analytics community. See the link below.


Description: The unique research area of audio-visual speech recognition has attracted much interest in recent years as visual information about lip dynamics has been shown to improve the performance of automatic speech recognition systems, especially in noisy environments.

Visual Speech Recognition: Lip Segmentation and Mapping presents an up-to-date account of research done in the areas of lip segmentation, visual speech recognition, and speaker identification and verification. A useful reference for researchers working in this field, this book contains the latest research results from renowned experts with in-depth discussion on topics such as visual speaker authentication, lip modeling, and systematic evaluation of lip features.


Friday, March 20, 2009

SpinVox Open API

Congrats to SpinVox for this important move. Opening your system to others can drive speech applications faster leveraging many more developers and creative minds. Whether it is human transcription or machine transcription, this move separates the speech processing part from the application part and push for SaaS speech enabled applications.


SpinVox to Demonstrate Open API Applications at CTIA Wireless 2009

Pre-Registrations For SpinVox Create Fuel Co-Development Program and Confirm Demand for Voice Conversion Services from Web-Based Technology Developers Wanting to Build Speech 2.0 Applications

CTIA Wireless 2009

LONDON & NEW YORK--(BUSINESS WIRE)--SpinVox, the global leader in voice to content messaging, will showcase three brand new Speech 2.0 applications at CTIA Wireless 2009, to be held April 1-3 in Las Vegas. The applications have been developed in less than a month to demonstrate the power of SpinVox Create, an open API (Application Programming Interface) to the SpinVox Voice Message Conversion System™ (VMCS), the world’s largest commercial speech platform.

SpinVox Create was announced at Mobile World Congress, Barcelona in February 2009 and developers were invited to pre-register their interest in SpinVox Create in advance of its launch via a web registration page - www.spinvox.com/developer

SpinVox Create will be launched as a key part of a two-stage corporate API strategy that will also be announced at CTIA and rolled out by SpinVox in the first half of 2009.

Nearly 100 developers have already registered interest in SpinVox Create and, of these, 20 have been selected by SpinVox to be part of the co-development program. Those selected include business efficiency, personal productivity, games and social networking applications.

SpinVox Create is a simple, straightforward API that leverages SpinVox’s commercial speech platform – which is growing quickly with more than 30 million users - to enable any developer with Web access to quickly build commercial speech applications. It also enables SpinVox to collaborate with third parties to expand the Speech 2.0 market and foster further innovations in voice that complement SpinVox’s existing platform development services for Enterprise application partners and Carrier networks.

“We’ve been impressed by both the quality and quantity of responses to our pre-registration announcement,” says SpinVox co-founder and CEO Christina Domecq. “We are clearly seeing an increased demand for voice conversion services from technology developers who recognize that a speech interface enables the most natural form of communication, and who want to build Speech 2.0 applications with best-in-class open products.”

Demonstration of Voice innovation on Apple, Nokia, and Windows Mobile platforms

Three applications will be demonstrated at CTIA Wireless 2009. These are based on Apple iPhone, Nokia Series 60, and Microsoft Windows Mobile platforms.

`Travel Blog`, a Windows Mobile 6.0 Application developed by Singapore-based Global Idealogy Corporation lets you tag and post your photographs using just your voice . You can select photographs through the application, speak a message, attach the converted text to the photographs, post it on blog, social networking websites or send it as email or MMS.

`Speak-a-Text`, a Nokia Series 60 Application developed by UK-based Symbian Platinum Partner, Savage Minds, incorporates the ability to speak a text which is converted to text and placed into the menu structure of the phone software.

`Memo`, an iPhone Application, developed by UK-based SpinVox allows iPhone users to speak a memo through the iPhone application and after conversion into text by the SpinVox VMCS the memo resides on the iPhone for instant access whenever needed.

Drive the next upturn

SpinVox has already received pre-registrations for SpinVox Create across the globe and looking ahead expects rapid uptake of the API particularly in Silicon Valley where SpinVox Web 2.0 services such as SpinVox Blog and SpinVox Social Networks have been increasingly popular.

Adds Domecq, “SpinVox has created a new category - carrier-grade voice conversion - and now is helping talented developers take advantage of the next growth opportunity in speech. The potential for innovation between carriers and the web is enormous – along with our own innovations we're now delivering a platform for creation of market changing applications and supporting their transformation to carrier-grade services. Speech 2.0 applications will be one of a cluster of innovations that will drive the next upturn as people are increasingly enabled to re-discover the power of their voice.”

About SpinVox

SpinVox® is the world's largest privately-held speech technology company, providing the only voice to text messaging services which are used daily by millions of people and whose user base has grown over twenty-fold in the last 12 months.

Through significant innovations in voice and network technologies which are protected by over 40 patents worldwide, SpinVox has converged the two most natural forms of communication - voice and text - to create the fastest-growing form of messaging: Voice-to-Content™.

SpinVox services are available directly on www.spinvox.com and through leading carriers and through new media, Unified Communications and other service providers globally.

Implemented as a carrier-class cloud service, SpinVox is proven to be able to easily create value from everyday user behavior using voice and deliver rapid and easy implementation of low input, sustained high reward services.

At the heart of SpinVox is its ground-breaking Voice Message Conversion System™ (VMCS), which works by combining state-of-the-art speech technologies with a live-learning language process. Developed by the Cambridge, UK- based SpinVox Advanced Speech Group; VMCS now serves users across five continents in English, French, Spanish, German, Portuguese and Italian.

SpinVox is now live with Alltel, Cincinnati Bell, Sasktel, Rogers Wireless, Telus, Telstra, Vodacom South Africa, Vodafone Spain, Movistar Chile, Skype and Livejournal.

Wednesday, March 18, 2009

SpinVox response for Google voice - where is the Nuance response?

As per the recent google voice voicemail transcription announcement, SpinVox is reponding -
In a recent post by Rich Tehrani from TMC, he quote a response from SpinVox about Google voice:

Google is entering a marketplace that continues to be led by SpinVox, the world's largest privately-held speech technology company. We're excited by the launch of Google Voice because it will demonstrate the benefits of speech-to-text conversion and validate its deployment as a network service to an increased audience. We have already launched carrier-grade services with 13 operators - including recently with Skype - on five continents and SpinVox is in use by in excess of 30 million users. SpinVox's 97 percent accuracy in conversion is now the benchmark around the world. - Christina Domecq, co-founder and CEO of SpinVox

And this 97% are human results. I am not sure that Google will post any response but we need to wait for usability feedbacks from users.

I further wonder about Nuance's response to Google Voice..?

Sunday, March 15, 2009

Google Voice - the new threat to the reputation of speech recognition system

Google posted last week the new enhancements for GrandCentral and actually its evolution to Google Voice. It is in general an application to better manage your voice communications.

The new application improves the way you use your phone. You can get transcripts of your voicemail (see the video below) and archive and search all of the SMS text messages you send and receive. You can also use the service to make low-priced international calls and easily access Goog-411 directory assistance. It is an addition to the GrandCentral standard features
including a single number to ring your home, work, and mobile phones, a central voicemail inbox that you could access on the web, and the ability to screen calls by listening in live as callers leave a voicemail. You'll find these features, and more, in the Google Voice preview. Check out the features page for videos and more information on how these features work. It is great to have visual voicemail and this will enahance further the iPhone and simialr smartphones.

Google is taking its voice recognition into prime time and into a very delicate position as it exposes all the voice transcripts load and clear with no option for human correction. I can only guess what will happen to voicemails in foreign language, heavy accent etc. We've all seen the mistakes in the Google speech recognition system used on for mining presidential campaign in the US and trust me, most of us do not speak as clear as Obama.

So as a user, there are several options for getting a transcription to your voice mail. SpinVox, Nuance and all other small players (SimulScribe, Jott etc.) who rely on human review and Google who is claiming that there maybe errors but its the user decision whether to rely on the provided transcripts.

I anticipate that the frustration from the quality of Google Voice transcription is going to be a source for bad attitude towards speech recognition technology (see the Dilbert cartoon I put in this blog in the past). While there were mechanisms to "hide" the embracing mistakes in other systems, here all the transcript will be visible and may relate to mission critical information.

My recommendation to Google is to open an interface for providing a manual transcripts. Either this can be connected to the SpinVox API (announced recently) or alternatively open an interface that allows sending a proposed transcript + Audio and receiving back a corrected transcript. This can open a new market for people who will be willing to send their voice communication to remote secretaries (probably in India) who will transcribe the audio and return it to Google Voice. This can also be a perfect fit to Amazon’s Mechanical Turk service where you can get people to perform simple tasks. If you want further info, contact me directly about the way it should be constructed including adaptive language id etc.