GSOC 1012: Grapheme to Phoneme Conversion in sphinx-4 &ndash; Project conclusions

Foreword

This article tries to summarize the Grapheme-to-Phoneme (g2p) in sphinx-4 project which was part of the GSoC 2012 program and can be thought as an integration of phonetisaurus [1] g2p application with both SphinxTrain and Sphinx-4. The project can be divided in three parts which are the g2p model training procedure integrated in the SphinxTrain application, the java g2p decoder integrated in Sphinx-4 and finally the new FST framework in java which was created for the project's needs.

The training procedure

The training procedure is based on the original phonetisaurus' training procedure using the openGRM NGram Library instead of the MITLM toolkit and in order to use it, you need first to install the openFST [2] and openGRM NGram [3] libraries in your system and then build the SphinxTrain application providing the --enable-g2p-decoder parameter to the autogen.sh script.

Training an acoustic model following the instructions found at [4], can train also a g2p model. As an addition to [4], after running the sphinxtrain -t an4 setup command, you need to enable the g2p functionality by setting the $CFG_G2P_MODEL variable in the same file to

$CFG_G2P_MODEL= 'yes';

By enabling the g2p functionality, the SphinxTrain application will in its initial steps train a new model based on the provided dictionary, and then will also use it to provide any missing pronunciations in the training transcription file.

The new java FST framework

In order to be able to use the generated g2p model in java we needed to port the original phonetisaurus' decoder to java. As a first step a general use java fst framework was created which is capable of handling fst models generated with openFST library and which contains all the required fst functionality and operations needed by the g2p decoder.

The java FST framework is available at CMUSphinx SVN Repository in [5].

Using the g2p models in sphinx-4

Having the various files (fst text file and input/output symbol tables files) of text format of the g2p model created with SphinxTrain, we need first to convert to the java FST binary format. This can be done using the openfst2java.sh script which is distributed with the java FST framework. The script accepts two parameters: the first one pointing to the base location (path and base filename excluding extensions) of the trained model's text format and the second providing the full path and filename to which the java FST model will be saved.
After the conversion, in order to use the java FST model, we need to add the following lines to the dictionary component in the configuration file

notice that the "wordReplacement" property should not exist in the dictionary component. The property "g2pModelPath" should contain a URI pointing to the g2p model in java fst format. The property "g2pMaxPron" holds the value of the number of different pronunciations generated by the g2p decoder for each word. For more information about sphinx-4 configuration can be found at [6].

Conclusion

Further to the new g2p feature introduced in sphinx-4, we need to emphasize the new java FST framework. Its' usage and extensive testing in the sphinx-4 g2p decoder suggest that its' implemented functionality are usable in general, although it may luck functionality required in different applications (eg. additional operations) which in any case should be not hard to implemented.

As a final note, the current article is just a summary of the work during the project, an extensive set of documentation is available at the GSoC project page [7].

References

[1] phonetisaurus A WFST-driven Phoneticizer

[2] OpenFst Library Home Page

[3] OpenGrm NGram Library

[4] Training Acoustic Model For CMUSphinx

[5] Java FST Framework SVN Repository

[6] Sphinx-4 Application Programmer’s Guide

[7] “GSoC 2012: Letter to Phoneme Conversion in CMU Sphinx-4”