Recently, a new version of OpenEars is announced. The main feature of a new release 1.3.0 is an upgrade to the latest CMUSphinx codebase pocketsphinx-0.8. This upgrade should bring additional stability and performance, so you are welcome to try it!
OpenEars is the most popular free offline speech recognition and text-to-speech framework on iOS, and the basis for the OpenEars Platform, a plugin system that lets you drag-and-drop new speech capabilities into your iOS app.
If you are interested in examples of the applications built with CMUSphinx and OpenEars framework, please visit this cool project. Photo editing can be a challenging task, and it becomes even more difficult on the small, portable screens such as camera phones that are now frequently used to edit images. To address this problem PixelTone, a multimodal photo editing interface that combines speech and direct manipulation was created:
This truely creative application demonstrates how powerful multimodal framework could be created with CMUSphinx. Your application could be the next voice-enabled one!
Pocketsphinx is a great alternative to a closed-source vendor SDK's due to it's open source nature, extensibility and features. If you are looking to impelment a speech application on Android, feel free to try Pocketsphinx. To get started, you can use existing applications like Inimesed
It's a great application to select contacts, you can install it on your device with a single click.
The sources and related things are available on the Github. Many thanks to Kaarel Kaljurand for his great software!
If you know some other applications using CMUSphinx, feel free to share!
A new English language model is available (updated) for download on our new Torrent tracker.
This is a good trigram language model for a general transcription trained on a various open sources, for example Guttenberg texts.
It archives the good transcription performance on various types of
texts, for example on the following tests sets the perplexities are:
Beside the transcription task, this model should be significantly better on conversational data like movie transcription.
The language model was pruned with a beam 5e-9 to reduce the model. It can be pruned further if needed or a vocabulary could be reduced to fit the target domain.
Modern speech recognition algorithms require enormous amount of data to estimate speech parameters. Audio recordings, transcriptions, texts for langauge model, pronuncation dictionaries and vocabularies are collected by speech developers. While it's not necessary to be the case in the future and better algorithms might require just a few examples, now you need to process thousands of hours of recordings to build a speech recognition system.
Estimates show that human recieves thousands hours of speech data before it learns to understand speech. Note that human has prior knowledge structure embedded into the brain we are not aware of. Google trains their models on 100 thousands hours of audio recorings and petabytes of transcriptions, still it behind the human performance in speech recognition tasks. For search queries they still have word error rate of 10%, for youtube Google's word error rate is over 40%.
While Google has a vast of resources so we do. We definitely can collect, process and share even more data than Google has. The first step in this direction is to create a shared storage for the audio data and CMUSphinx models.
We created a torrent tracker specifically to distribute a legal speech data related to CMUSphinx, speech recognition, speech technologies and natural language processing. Thanks to Elias Majic, the tracker is available at
Currently tracker contains torrents for the existing acoustic and language models but new more accurate models for US English and other languages will be released soon.
We encourage you to make other speech-related data available through our tracker. Please contact firstname.lastname@example.org mailing list if you want to add your data set to the tracker.
Please help us to distribute the data, start a client on your host and make the data available to others.
To learn more about BitTorrent visit this link or search in the web, there is a vast amount of resources about it.
You might wonder what is the next step. Pretty soon we will be able to run a distributed acoustic model training system to train the acoustic model using vast amount of distributed data and computing power. With a BOINC-grid computation network of CMUSphinx tools we together will create the most accurate models for speech. Stay tuned.