We are pleased to announce that today a pack of CMUSphinx packages was released:
For the download links see:
The biggest update of this release is a new sphinxtrain. The code sharing between sphinxbase and sphinxtrain significantly increased bringing more consistent codebase and interface, accurate memory management and increased usability.
Beside that, a single sphinxtrain binary is introduced to provide an easy and flexible access to the whole training procedure. In the future we hope to reduce the amount of Perl scripts in training setup and to port everything on Python. This will open the access to an advanced Python ecosystem including scientific packages, graphics and distributed computing.
Another notable change of this release in a new openfst-based G2P framework implemented during Google Summer of Code. Credits for this should go to Josef Robert Novak and John Salatas. This framework is also supported by sphinx4 and provides a uniform and accurate algorithm to create dictionaries from word lists.
A numerous bug fixes and improvements were submitted by our contributors. We should be grateful to the great developers who made this release possible. Many thanks to our star team, which is impressively long:
For more detailed information see the NEWS file in the corresponding packages.
The new sphinx4 package and an android demo using pocketsphinx will be released soon, finalizing the release cycle. After that, a great new features will start their way into codebase. Stay tuned.
For those who are interested in CMUSphinx on mobile, please check out the PolitePix blog where you could find some interesting ideas about pocketsphinx on iPhone:
OpenEars is the easiest way to try open offline speech recognition on iPhone platform. If you are interested to add speech recognition to your iPhone application, you should definitely check it out.
(Author: Srikanth Ronanki)
(Status: GSoC 2012 Pronunciation Evaluation Final Report)
This article briefly summarizes the implementation of GSoC 2012 Pronunciation Evaluation project.
Primarily, I started with sphinx forced-alignment and obtained the spectral matching acoustic scores, duration at phone, word level using WSJ models. After that I tried concentrating mainly on two things. They are edit-distance neighbor phones decoding and Scoring routines for both Text-dependent and Text-independent systems as a part of GSoC 2012 project.
Edit-distance Neighbor phones decoding:
1. Primarily started with single-phone decoder and then explored three-phones decoder, word decoder and complete phrase decoder by providing neighbor phones as alternate to the expected phone.
2. The decoding results shown that both word level and phrase level decoding using JFGF are almost same.
3. This method helps to detect the mispronunciations at phone level and to detect homographs as well if the percentage of error in decoding can be reduced.
This method is based on exemplars for each phrase. Initially, mean acoustic score, mean duration along with deviations are calculated for each of the phone in the phrase based on exemplar recordings. Now, given the test recording, each phone in the phrase is then compared with exemplar statistics. After that, z-scores are calculated and then normalized scores are calculated based on maximum and minimum of z-scores from exemplar recordings. All phone scores are aggregated to get word score and then all word scores are aggregated with POS weight to get complete phrase score.
This method is based on predetermined statistics built from any corpus. Here, in this project, I used TIMIT corpus to build statistics for each phone based on its position (begin/middle/end) in the word. Given any random test file, each phone acoustic score, duration is compared with corresponding phone statistics based on contextual information. The scoring method is same as to that of Text-dependent system.
Please try our demo @ http://talknicer.net/~ronanki/test/ and help us by giving the feedback.
Documentation and Codes
All codes are uploaded at CMUSphinx svn @ http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/branches/speecheval/ronanki/ and raw documentation of the project can be found here.
The pronunciation evaluation system really helps second-language learners to improve their pronunciation by trying multiple times and it lets you correct your-self by giving necessary feedback at phone, word level. I couldn't complete some of the things like CART modelling I have mentioned earlier during the project. But I hope that I can keep my contributions to this project in future as well.
This summer has been a great experience to me. Google Summer of code 2012 has finally ended. As a final note, the current article is just a summary of the work during the project, an extensive set of documentation will be updated at https://cmusphinx.github.io/wiki/faq#qhow_to_implement_pronunciation_evaluation. You can also read more about this project and weekly progress reports at http://pronunciationeval.blogspot.in/
(author: Troy Lee)
This article briefly summarized the Pronunciation Evaluation Web Portal Design and Implementation for the GSoC 2012 Pronunciation Evaluation Project.
The pronunciation evaluation system mainly consists following components:
1) Database management module: Store, retrieve and update all the necessary information including both user information and various data information such as phrases, words, correct pronunciations, assessment scores and etc.
2) User management module: New user registration, information update, change/reset password and so on.
3) Audio recording and playback module: Recording the user's pronunciation for further processing.
4) Exemplar verification module: Justify whether a given recording is an exemplar or not.
5) Pronunciation assessment module: Provide numerical evaluation at the phoneme level (which could be aggregated to form higher level evaluation scores) in both acoustic and duration aspects.
6) Phrase library module: Allow users to create new phrases into the database for evaluation.
7) Human evaluation module: Support human experts to evaluate the users' pronunciations which could be compared with the automatically generated evaluations.
The website could be tested at http://talknicer.net/~li-bo/datacollection/login.php. Do let me know (email@example.com) once you encounter any problem as the site needs quite a lot testing before it works robustly. The complete setup of the website could be found at http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/branches/speecheval/troy/. More detailed functionality and implementations could be found in a more manual like report:
Although it is the end of this GSoC, it is just the start of our project that leveraging on open source tools to improve people's lives around the world using speech technologies. We are currently preparing using Amazon Mechanical Turk to collect more exemplar data through our web portal to build a rich database for improved pronunciation evaluation performance and further making the learning much more fun through gamification.