Phonetically Tied Mixtures (with models)


Support for phonetically-tied mixture acoustic models has been added to the Subversion repository for SphinxTrain, Sphinx3, and PocketSphinx.  Briefly, phonetically-tied mixture models are somewhere between semi-continuous and fully-continuous models, offering most of the speed of the former combined with the ability of the latter to effectively use large amounts of training data.

Parameter settings for training PTM models are present in the template sphinx_train.cfg file  created by SphinxTrain, and can be enabled by setting $CFG_HMM_TYPE to ".ptm.".  The development version of PocketSphinx will automatically recognize PTM models, while Sphinx3 requires you to add "-senmgau .ptm." to the command line.

We have made PTM models for English and Mandarin available for download on the SourceForge dowloads page.  These have not been extensively optimized, but the English models, at least, already offer better performance than comparable fully-continuous models.  Compressed and optimized versions of these in 8k bandwidth will be released with PocketSphinx 0.6.

n.b. A dictionary and language model (caution: very large) for Mandarin are also available.