Wednesday, July 16, 2008

Google Video Search via Speech Recognition




Finally a hint on the expected Google move to the speech recognition arena.

Google announced at the Official Google Blog, the availability of a new video search capability based on speech recognition.

It was release as a gadget you can embed on your iGoogle homepage and is a good preview of things to come.
The gadget only searches videos uploaded to YouTube's Politicians channels, which include videos from Senator Obama's and Senator McCain's campaigns, as well as those from dozens of other candidates and politicians. It usually takes less than a few hours for a video to appear in the index after it has been published on YouTube.

So apart from congratulations to the google team who are exposed to the public for the first time, how are they compared to other speech recognition engines aimed for broadcast quality? The google team refer to their precision: "While some of the transcript snippets you see may not be 100% accurate, we hope that you'll find the product useful for most purposes." While I do not understand what are the purposes for just searching within the YouTube political channel, people should be aware of much more mature solutions developed in the past years. From the pioneering work of BBN and IBM to the existing online solutions like everyzing, tveyes, blinx, snipp.tv by NSC and more. Based on the perceived quality, the google team has a long way to go in order to get to the first league and to be able to analyze data which is not at broadcast quality. The good news as users is that the YouTube data is easier to process relative to telephony calls speech recognition performed widely today at contact centers by companies like Verint, Nice, Autonomy, Utopy, Nexidia, CallMiner and other players.

1 comment:

Alan said...

Speech recognition software may contribute to online search capabilities, but if speech writers and their delivery agents (politicians) were to edit down speeches and stick to the point, "talking the talk", then a two-fold benefit might occur:

1) speech recognition software might do better, and

2) humans reading this might experience less information overload.