CMUSphinx Open Source Speech Recognition

May 23, 2011

Long Audio Alignment: Week 1

After one week of steady work, I finally make the first post on my results and findings.

For detailed description of the project please read here.

I started with a few experiments on various grammars to see which preformed best and in what scenarios. By manipulating just the grammar I could only reach a word error rate of almost 18% for audio files which were almost 6 minutes long. Some observations made from these experiments were:

Branching in the grammar yields better results, i.e. by having large number grammar paths from start word node to the final node while keeping sizes of each such path small. I preferred making one grammar path for each sentence in the text, since a sentence in an utterance is usually accompanied with small silence at the end.
Allowing inter-word transitions between word and a large number of it's neighbors in the grammar does not improve the results but also slows down the alignment by incurring additional computational overhead.
A left to right no skip grammar is not good for alignment of slightly large utterances, with or without out of vocabulary words.

A source of error in alignment comes from words in the text that are not in the dictionary (Out of Vocabulary words). It was hence proposed to provide a Java based model for generating phonetic representations for any such word. This module prepares a FST based on test data and uses it to make hypothesis for word pronunciations. As of now, the front end for this module is nearly complete and depends on an automata library in Java (which as it seems does not exist for now). We now plan to implement this library and finally test the improvement in alignment due to this addition.

May 2, 2011

Announcing GSoC Students

We would like to thank applicants for putting the time and effort into creating GSoC applications to work on CMUSphinx. We were ultimately provided with two slots and had many great applications that made choosing very difficult. We hope that students who were not accepted will still get involved with CMUSphinx and look forward to receiving your applications next year.

We are pleased to announce that two spots were awarded to Michal Krajňanský and Apurv Tiwari.
Michal

Michal is a student at Masaryk University in Brno, Czech Republic. He is taking Informatics - Artificial Intelligence & Natural Language Processing. Michal will be working on training acoustic models on long audio files. This will be done by optimizing SphinxTrain through the utilization of massively parallel hardware - the NVIDIA CUDA framework. It will enable acoustic model training on long audio files by the utilization of the NVIDIA CUDA architecture that will reduce the memory requirements of the Baum-Welch algorithm and significantly speed things up. Lastly, he will also modify SphinxTrain to be able to process long input audio files.

Apurv is a student at the Indian Institute of Technology Delhi in New Delhi, India. He is taking Mathematics and Computing. Apurv will be working on adding Long Audio Alignment to CMUSphinx. The problem he will solve is to align a given approximate-transcription for audio data corresponding to the audio file as well as improve the transcription at points of low confidence.

The mentors team includes Prof. James Baker, Prof Bhiksha Raj. as well as all the members of our community.

Both Apurv and Michal will blog weekly about their experience. The blogs will appear here at https://cmusphinx.github.io/

We want to thank Google for providing this wonderful opportunity and the mentors for donating their valuable time. We eagerly anticipate great things from Apurv and Michal. Stay tuned!

Apr 16, 2011

CMUSphinx 0.7 Is Released

We are pleased to announce the availability of the updated CMUSphinx toolkit. You can find updated sphinxbase, pocketsphinx, sphinxtrain, cmuclmtk and sphinx4 in downloads section

https://cmusphinx.github.io/wiki/download/

Major changes include

Sphinxtrain actively uses sphinxbase functions
Training is more user-friendly
Various advanced training techniques are implemented
Pocketsphinx is way faster on big FSG grammars
Many bug fixes and user-friendly improvements

See NEWS in each package for more details. More changes to come soon, enjoy.

Mar 19, 2011

CMUSphinx at GSOC 2011

GSOC

We are pleased to announce that CMUSphinx project is accepted to Google Summer Of Code 2011 program. That will enable us to help several students to start their way in speech recognition, open source development and in CMUSphinx. We are really excited about that.

http://www.google-melange.com/gsoc/program/home/google/gsoc2011

If you are interested to participate as a student, an application period will open soon but it's better to start preparation of your application right now. Feel free to contact us for any questoins! For more details see:

https://cmusphinx.github.io/wiki/summerofcodestudents

If you would like to be a mentor please sign in into gsoc web application and add your ideas to the ideas list:

https://cmusphinx.github.io/wiki/summerofcodeideas

We invite you to participate!

Newer

Older

Page 29 of 37