public class SequiturImport
extends java.lang.Object
Converter for an Fst in Sequitur G2P's XML to Sphinx binary OpenFst format.
Sequitur G2P (http://www-i6.informatik.rwth-aachen.de/web/Software/g2p.html)
provides easy-to-build G2P training facilities. Its binary models can be
converted to an XML FSA-format using fsa.py which is provided with Sequitur.
This program reads the XML and constructs a @link{edu.cmu.sphinx.fst.Fst},
which is then serialized into the Sphinx binary OpenFst format (but could
also be used directly).
NOTICE: Sequitur's fsa.py does not in all cases construct valid XML,
specifically it fails to encode XML character entities &, <, and >
if these were part of the training material. If in doubt, please check for
and replace them in the alphabet portion of the XML prior to using this
converter.
Implementation details: - we add a state for <s> to the end of both
symbol alphabets - we increment all state IDs in the states and in the arcs -
we add a new zero'th state which transitions via <s>:<s> to the
(new) first state