Tuesday, June 05, 2007

Speech to Text coming to cell phones

This video introduces Morpheus's upcoming speech to text technology. The reason this is important is audio bandwidth. The phone network is based on 64 kbps audio of 4 KHz bandwidth, which is fine for humans to understand but is missing a lot of the higher frequencies that speech recognition engines need to improve their accuracy. That's why phone-based speech rec uses discrete grammars, which are simpler to recognize. Desktop speech rec can do full dictation because a high-quality audio path exists.

Morpheus (and others) use network-based speech rec. The user's device captures the audio and does some basic processing before streaming it as data. Network-based speech rec engines receive the data, do the recognition, and send back the recognized text. Not only does this avoid the audio bandwidth problems, it also avoids running the speech rec engine on a CPU-limited cell phone.

This still isn't perfect dictation accuracy. The Morpheus video mentions about a 10% error rate. So it's still not really ready for dictating blog posts from your phone, but the accuracy is tightly tied to CPU power which improves every year.

As the VUI design blog says, the recent acquisitions of BeVocal and TellMe is perhaps being driven by interest in network-based speech.

No comments: