VocalKit: Shim for Speech Recognition on iPhone

Brian King wrote a blog post on getting pocket sphinx up and running on iPhone a while back and got a few emails last week asking for help.  He was so amazing so he cleaned up the code and made a little library for it:

http://github.com/KingOfBrian/VocalKit

This should give you a library that statically links the sphinix libraries and a simple API that connects to Audio Queue.  It also comes with a test program so you should be able to have a demo up and working very quickly inside of XCode.

The plans are to add to the API to support dynamic language model creation, but the main goal is to get people up and running as soon as possible.  He would appreciate any feedback!

Revision number 10000

Interesting enough, revision number 10000 was committed today to the SVN repository. Well, we are looking forward to see the revision number 100k, delivering you the best ASR experience.

PocketSphinx 0.6 errata, updated Ubuntu packages

A number of minor issues have been found in the PocketSphinx 0.6 release. We will be preparing a 0.6.1 release to address these, but if you are affected by them, you can track the stable branch of the source code at http://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/pocketsphinx-0.6 .

For Ubuntu users, source packages and binary packages for Lucid Lynx (10.04 beta) which track the bug-fixes can be found at https://launchpad.net/~dhuggins/+archive/cmusphinx - or simply add ppa:dhuggins/cmusphinx to your list of software sources.

Development Meeting Notes

There were two great development meetings in Dallas during ICASSP. The goal was to develop some roadmap document about what will happen in near future. First meeting was about discussions and second meeting was for review of statements and to make an action plan. Attendees were

Both meetings:

  • Bhiksha Raj
  • Rita Singh
  • Evandro Govea
  • Nick Shmyrev
  • David Huggins-Daines

Only first meeting:

  • Benoit Favre
  • Jagadeesh Balam
  • Alex Rudnicky (over phone)

The following topics were discussed.

Development directions. It's long waited to implement WFST-based decoder as part of CMUSphinx tools. Such decoders are considered to be very interesting because language models, acoustic models, dictionary and even result lattices are unified under common data framework. Training could be done using openfst tools. Such framework could appear in a near future.

Development directions. CMU starts a project at CISL dedicated to building a methods to support almost all languages on Earth. That will include collection of data for semi-supervised model training,
automatic selection of the dictionary, language modeling and many other interesting things. More updates on this project will appear soon.

Sphinx4-1.0 release. Right now sphinx4 is not as good as sphinx3 for the following reasons:

  • There is difference of results of lextree search in sphinx3 and token search in sphinx4, such difference could be easily tested by comparison of single lextree search in sphinx3 and search in sphinx4.
  • There are no good regressions tests everyone can run to make sure nothing is broken.
  • Lattices aren't generated properly, they are not really usable (see Long Qin presentation).
  • There is no flat search implementation.

The issues here are listed in order of their complexity. Once those issues will be resolved we can deprecate sphinx3 and release sphinx4-1.0. There is concern about using Java for most accurate
recognizer, we need to run poll on that issue and also we could suggest pocketsphinx as sphinx3 replacement for resource-constrained environments.

Documentation. We really want to improve the quality of documentation. It means we'll try to create consistent online documentation with howto's, video tutorials and many other things as well as good printed documentation as it's also very important. The following things would be nice to do in near future:

  • Create a plan what should be in a user book. To do that we need to review most popular books like a book for Weka, HTK book. This book is intended for the users of CMU Sphinx, developers of ASR applications, not for researchers.
  • There will be book on ASR research soon that is closely tied to CMU Sphinx and that will be book we could suggest to researchers interested in CMU Sphinx. For now "Spoken Language Processing" is recommended book.
  • Update FAQ. We encourage everyone to submit most frequent questions to FAQ in order to make it usable for reference.
  • Try to clean old obsolete information
  • Help is required to sort out wiki documents merged from twiki, subwiki and other sources.

Web-Service. Web services in particular lmtool proved to be very successful because of low entrance cost to try the system. We need to develop web infrastructure in various ways. Since this requires more control over the system and also more computational resources we have to setup
cluster to provide services:

  • Various ASR services like language model service, pronunciation, transcription
  • Data uploading
  • Data distribution

We'll also provide a live system image for CMU Sphinx tools to lower barrier to try CMU Sphinx.

Funding. We have a number of very things to be done. Since many of them require significant resources it would be nice to have an organization that will be able to fund the development and infrastructure maintaince. The example of such organizations are 503(c)(6) non-profits like Apache
Foundation. Suggestions are welcome.

LIUM suggestions. LIUM is doing amazing work on CMUSphinx project and we would be glad to make it merged. During presentation LIUM raised the following issues:

  • Is it possible to build a common roadmap to anticipate future changes (and to help for future collaboration between all the Sphinx developers)?
  • Can we work together around a common project (demonstrator,evaluation campaign, or other)? in order to federate our efforts to valorize the CMU Sphinx project

On our side there are following concerns:

  • License of the LIUM code
  • Availability of the changes done by LIUM during the development
  • Research scope of LIUM tools while we also want to target application developers.

We would be glad to discuss those things. Follow up on this will be posted soon.

Various bits. We'll try to improve sphinx4 as time goes. Some bits to mention:

  • Frontend rework to include modern VAD/noise cancellation
  • Multipass decoding
  • Development sets introduced in tutorials for optimizing parameters during training.
  • PLP models by default
  • MLLR will be used for online adaptation