Sonalight, which showed off its product at this week’s Y Combinator Demo Day, thinks voice tech is better put to use tackling real issues users have with their mobiles in everyday settings, like texting while driving. Sonalight actually employs Google’s own existing voice recognition tech, in combination with the CMU Sphinx open source software, to achieve its results. This is a great use case for CMUSphinx.
Visit
To try it.
We are pleased to announce that CMUSphinx project is accepted to Google Summer Of Code 2012 program. That will enable us to help several students to start their way in speech recognition, open source development and in CMUSphinx. We are really excited about that.
http://www.google-melange.com/gsoc/org/google/gsoc2012/cmusphinx
If you are interested to participate as a student, an application period will open soon but it’s better to start preparation of your application right now. Feel free to contact us for any questoins! For more details see:
https://cmusphinx.github.io/wiki/summerofcodestudents
If you would like to be a mentor please sign in into gsoc web application and add your ideas to the ideas list:
https://cmusphinx.github.io/wiki/summerofcodeideas
We invite you to participate!
The iSound is a program that was built with help of the CMU Sphinx-4 system. It is a part of the thesis at the Faculty of Mathematics, Natural Sciences and Information Technologies from Koper, Slovenia. Its main goal is real-time audio signal visualization, also known as spectrogram or sonogram. Which means, that it allows observation of the sound.
This property could be useful in many areas, such as: phonetics, animal sounds analysis, music, sonar/radar, speech processing, seismology, etc. Additionally, it has included few features into basic spectrogram drawing, which made the application more useful. That features are: image freezing, zoom control, signal frequency display, resizing, changing of the color schemes and contrast adjustment.
Compared to the other programs with similar functioning it gives promising results of the CPU, memory and graphics usage. Tests were made on Windows XP, Windows 7, Linux ubuntu and OS X Lion. You can find full test results in the diploma, on page 40.
Author’s comment: “For future work i plan to publish research work as an article in the journal. Currently I’m working on idea, how to use similar technologies and develop a tool, which can help persons with hearing handicap.”
Find useful information about the project, at the author, Irman Abdić’s web page:
http://www.irmanabdic.com
In many languages the amount of lexical forms is huge due to morphology. Even simple vocabulary can contain several million forms and variations. It's hard to recognize such a big vocabulary because of huge search space. Decoder is slow and a language model takes enormous amount of memory.
Of course brute force approach make sense and actually quite successful but better ones already suggested. For example using morphological segmener we can build a language model and the acoustic model which can describe the same vocabulary in way smaller number of subword items. Real words are combined from the chunks which are separate entities in a language model. This way our search space is efficiently represented and the speed is comparable to English models.
The tricky part is to properly segment the words. Because pronunciaiton of decomposition is not so straightforward it takes some effort to build the split. We are happy that our contributor Zamir Ostroukhov managed to solve that problem. He created the acoustic model from the audiobooks from the Voxforge database and used large text corpora to create a morphologically-segmented language model. This is a very promising approach for morphologically-rich language so we look forward to see this framework as a part of CMUSphinx. Maybe this framework could be extended to multilevel speech representation which could hold both subwords and sentence-level items.
Check Zamir's project
https://github.com/zamiron/ru4sphinx
For more details on the approach please see
Large vocabulary continuous speech recognition of an inflected language using stems and endings by Toma Rotovnik at al.
Download Russian audiobook model here, the morphological language model is included:
http://sourceforge.net/projects/cmusphinx/files/Acoustic and Language Models/Russian Audiobook Morphology Zero
For more details see
http://www.cis.hut.fi/projects/morpho/