Important: Workshop Date Change

Huh, thanks god I didn't book tickets already as wife suggested! Hope you are too. Anyway, the workshop will be on March 13, be careful about that. Between, the program is promising to be extremely interesting.

Preliminary program

1:30-1:45: Coffee

1:45-2:10: A Sphinx Based Speech-Music Segmentation Front-End For Improving The Performance Of An Automatic Speech Recognition System In Turkish
Cemil Demir, TUBITAK-UEKAE; Erdem Ünal, TUBITAK-UEKAE; Mehmet Ugur Dogan,TUBITAK-UEKAE

2:10-2:35:
LIUL_SpkDiarization: An Open Source Toolkit For Diarization
Sylvain Meignier, LIUM ; Teva Merlin, LIUM

2:35-3:00: Scientific Learning Reading Assistant(TM): CMU Sphinx Technology in a Commercial Educational Software Application
Valerie L. Beattie, Scientific Learning Corporation

3:00-3:15:
Coffee break

3:15-3:40: Myovox: A Plug And Play Device Emulating A Mouse And Keyboard Using Speech And Muscle Inputs
Matthew Belgiovine, University of Pennsylvania; Mike DeLiso, University of Pennsylvania; Steve McGill, University of Pennsylvania

3:40-4:05: Some recent research works at LIUM based on the use of CMU Sphinx
Yannick Estève, LIUM ; Paul Deléglise, LIUM ; Sylvain Meignier, LIUM ; Holger Schwenk, LIUM ; Loic Barrault, LIUM ; Fethi Bougares, LIUM ; Richard Dufour, LIUM ; Vincent Jousse, LIUM ; Antoine Laurent, LIUM ; Anthony Rousseau, LIUM

4:05-4:30: Implementing and Improving MMIE training in SphinxTrain
Long Qin, Carnegie Mellon University; Alexander Rudnicky, Carnegie Mellon University

Phonetically Tied Mixtures (with models)

Support for phonetically-tied mixture acoustic models has been added to the Subversion repository for SphinxTrain, Sphinx3, and PocketSphinx.  Briefly, phonetically-tied mixture models are somewhere between semi-continuous and fully-continuous models, offering most of the speed of the former combined with the ability of the latter to effectively use large amounts of training data.

Parameter settings for training PTM models are present in the template sphinx_train.cfg file  created by SphinxTrain, and can be enabled by setting $CFG_HMM_TYPE to ".ptm.".  The development version of PocketSphinx will automatically recognize PTM models, while Sphinx3 requires you to add "-senmgau .ptm." to the command line.

We have made PTM models for English and Mandarin available for download on the SourceForge dowloads page.  These have not been extensively optimized, but the English models, at least, already offer better performance than comparable fully-continuous models.  Compressed and optimized versions of these in 8k bandwidth will be released with PocketSphinx 0.6.

n.b. A dictionary and language model (caution: very large) for Mandarin are also available.

New Model-In-Jar file format

The latest version of Sphinx4 has a new and improved model system.

Now all acoustic and language models are loaded as normal files in directories.
File paths are specified as URIs and therefore may exist anywhere on the Internet.
In addition, for convenience, "resource:" causes Sphinx4 to look on the classpath
for a file.

Special Models and ModelLoaders are no longer required, and resource specifiers
no longer require the clumsy resource:!/ syntax.

Kudos to Peter

JSGF Refactoring in sphinx4

The major refactoring of JSAPI part of sphinx4 happened recently. The roots of it lie deep in the history of sphinx4. From birth in the previous century sphinx4 was going to support industy-streight standards in particular java speech API. Actually sphinx4 was a playground for JSAPI development.

The code for JSAPI support as any unsupported code that was written long time ago was rather hard to read and modify. And, the most important, JSAPI structures were used everywhere in it. That would be ok if JSAPI was free and distributed in sources, unfortunately it's not the case. It goes under restrictive license that prevents free redistribution. That was the major problem, and I bet you meet it when you started sphinx4 development first time and forgot to unpack jsapi.jar and agree with it's license. Not to mention that implementation of the API was incomplete, basically you could only play with grammars, nothing more. No real recognizer API was supported.

Now this sutuation changed drastically:

  • JSGF Parser and grammars are now parts of sphinx4.jar, free from any licence issues. You could use JSGF grammars as any other part of api.
  • Implementation of JSAPI now is built on top of sphinx4, making it easy to split it, test it and use it.
  • New JSAPI code looks more or less modern.
  • Functional JSAPI-1.0 implementation with recognition part implemented will be here soon.
  • In a near future JSAPI-2.0 interface will probably arrive.

Such big architectural changes aren't smooth of couse, regressions are expected. Don't hestitate to report them  :).