CMUSphinx Open Source Speech Recognition

Mar 25, 2010

Development Meeting Notes

There were two great development meetings in Dallas during ICASSP. The goal was to develop some roadmap document about what will happen in near future. First meeting was about discussions and second meeting was for review of statements and to make an action plan. Attendees were

Both meetings:

Bhiksha Raj
Rita Singh
Evandro Govea
Nick Shmyrev
David Huggins-Daines

Only first meeting:

Benoit Favre
Jagadeesh Balam
Alex Rudnicky (over phone)

The following topics were discussed.

Development directions. It's long waited to implement WFST-based decoder as part of CMUSphinx tools. Such decoders are considered to be very interesting because language models, acoustic models, dictionary and even result lattices are unified under common data framework. Training could be done using openfst tools. Such framework could appear in a near future.

Development directions. CMU starts a project at CISL dedicated to building a methods to support almost all languages on Earth. That will include collection of data for semi-supervised model training,
automatic selection of the dictionary, language modeling and many other interesting things. More updates on this project will appear soon.

Sphinx4-1.0 release. Right now sphinx4 is not as good as sphinx3 for the following reasons:

There is difference of results of lextree search in sphinx3 and token search in sphinx4, such difference could be easily tested by comparison of single lextree search in sphinx3 and search in sphinx4.
There are no good regressions tests everyone can run to make sure nothing is broken.
Lattices aren't generated properly, they are not really usable (see Long Qin presentation).
There is no flat search implementation.

The issues here are listed in order of their complexity. Once those issues will be resolved we can deprecate sphinx3 and release sphinx4-1.0. There is concern about using Java for most accurate
recognizer, we need to run poll on that issue and also we could suggest pocketsphinx as sphinx3 replacement for resource-constrained environments.

Documentation. We really want to improve the quality of documentation. It means we'll try to create consistent online documentation with howto's, video tutorials and many other things as well as good printed documentation as it's also very important. The following things would be nice to do in near future:

Create a plan what should be in a user book. To do that we need to review most popular books like a book for Weka, HTK book. This book is intended for the users of CMU Sphinx, developers of ASR applications, not for researchers.
There will be book on ASR research soon that is closely tied to CMU Sphinx and that will be book we could suggest to researchers interested in CMU Sphinx. For now "Spoken Language Processing" is recommended book.
Update FAQ. We encourage everyone to submit most frequent questions to FAQ in order to make it usable for reference.
Try to clean old obsolete information
Help is required to sort out wiki documents merged from twiki, subwiki and other sources.

Web-Service. Web services in particular lmtool proved to be very successful because of low entrance cost to try the system. We need to develop web infrastructure in various ways. Since this requires more control over the system and also more computational resources we have to setup
cluster to provide services:

Various ASR services like language model service, pronunciation, transcription
Data uploading
Data distribution

We'll also provide a live system image for CMU Sphinx tools to lower barrier to try CMU Sphinx.

Funding. We have a number of very things to be done. Since many of them require significant resources it would be nice to have an organization that will be able to fund the development and infrastructure maintaince. The example of such organizations are 503(c)(6) non-profits like Apache
Foundation. Suggestions are welcome.

LIUM suggestions. LIUM is doing amazing work on CMUSphinx project and we would be glad to make it merged. During presentation LIUM raised the following issues:

Is it possible to build a common roadmap to anticipate future changes (and to help for future collaboration between all the Sphinx developers)?
Can we work together around a common project (demonstrator,evaluation campaign, or other)? in order to federate our efforts to valorize the CMU Sphinx project

On our side there are following concerns:

License of the LIUM code
Availability of the changes done by LIUM during the development
Research scope of LIUM tools while we also want to target application developers.

We would be glad to discuss those things. Follow up on this will be posted soon.

Various bits. We'll try to improve sphinx4 as time goes. Some bits to mention:

Frontend rework to include modern VAD/noise cancellation
Multipass decoding
Development sets introduced in tutorials for optimizing parameters during training.
PLP models by default
MLLR will be used for online adaptation

Mar 25, 2010

CMUSphinx and Google Summer of Code 2010

We applied for Google SoC programm 2010 but were rejected. Anyway we appreciate Google's contribution to open souce development. We wish all accepted projects to successfully finish the program this year and we wish good luck to all the students who will participate.

As for us, we still have a lot of tasks that every newbie could put hands on

https://cmusphinx.github.io/wiki/summerofcodeideas

We would be glad to guide anyone who wants to start with them. If you are a student and want to learn more about speech recognition, it's your chance to jump in. We are also open for sponsorship suggestions for this task.

Mar 24, 2010

Sphinx Users And Developers Workshop 2010 Results

So, CMU Sphinx workshop in Dallas is over. Let us congratulate all participants especially submission authors. That was a great event, the room was full! We were amazed by number of people who attended, their passion and interest in CMU Sphinx. We would be glad to see more participants next year!

For those who missed the workshop, the papers and some slides are available on the website. Certainly you could find something interesting there like new feature release announcements, applications details and new research topics. We didn't forget to support ASR research of course. Workshop was recorded by many recording devices of various types and this data will serve as a database for meeting transcription system.

Of course, the most important side of being on workshop is face-to-face communication. It was important for us to collect and address concerns of our users. Main issues noted were the following:

Using CMUSphinx in application development. How to make sure the best possible way is taken.
Using CMUSphinx in research projects. How to get stability guarantees to ensure that work will not be lost or done twice
Project planning. How to get more information on the project future.

Luckily problems above are mostly organizational issues. There were two development meetings after the workshop to address them. Expect a new announcement about it soon.

We would be glad to continue discussions about CMU Sphinx. Please subscribe to the development mailing list https://lists.sourceforge.net/lists/listinfo/cmusphinx-devel. We would be glad to answer your questions and would appreciate your suggestions.

Mar 19, 2010

PocketSphinx 0.6 release

We are pleased to announce the long-awaited PocketSphinx 0.6 release, including SphinxBase 0.6. This release corresponds to SVN revision 9898.

PocketSphinx is a small-footprint continuous speech recognition system, freely licensed under a simplified BSD license, suitable for handheld and desktop applications. It features:

Cross-platform: Linux, Windows, Mac OS X, iPhoneOS
Experimental support for Nokia S60v3 and Windows Mobile
Support for semi-continuous, phonetically-tied, and fully continuous acoustic models
Model footprint on disk of about 10MB per language
Memory footprint under 20MB for medium-vocabulary continuous recognition
Trigram language models and JSGF finite-state grammars
Acoustic models for English and Mandarin
Small language models for English and Mandarin (simplified and traditional characters)
Python language bindings
GStreamer multimedia framework integration

The release branch can be accessed via Subversion at http://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/pocketsphinx-0.6 - this is the preferred way to access the release, particularly if you are using Windows.

This exact release tag can be accessed at http://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/tags/pocketsphinx-0.6

Source code archives are now available for download at http://sourceforge.net/projects/cmusphinx/

Debian/Ubuntu source packages are available from https://launchpad.net/~dhuggins/+archive/cmusphinx

Newer

Older

Page 34 of 37