(author: John Salatas)
Foreword
Following our previous article on phonetisaurus [1] and the decision to use this framework as the g2p conversion method for my GSoC project, this article will describe the port of the dictionary alignment script to C++.
1. Installation
The procedure below is tested on an Intel CPU running openSuSE 12.1 x64 with gcc 4.6.2. Further testing is required for other systems (MacOSX, Windows).
The alignment script requires the openFST library [2] to be installed on your system. Having downloaded, compiled and installed openFST, the first step is to checkout the alignment code from the cmusphinx SVN repository:
$ svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/g2p/train
and compile it
$ cd train
$ make
g++ -c -g -o src/align.o src/align.cpp
g++ -c -g -o src/phonetisaurus/M2MFstAligner.o src/phonetisaurus/M2MFstAligner.cpp
g++ -c -g -o src/phonetisaurus/FstPathFinder.o src/phonetisaurus/FstPathFinder.cpp
g++ -g -l fst -l dl -o src/align src/align.o src/phonetisaurus/M2MFstAligner.o src/phonetisaurus/FstPathFinder.o
$
2. Usage
Having compiled the script, running it without any command line arguments will print out it's usage, which is similar to that of the original phonetisaurus m2m-aligner python script:
$ cd src
$ ./align
Input file not provided
Usage: ./align [--seq1_del] [--seq2_del] [--seq1_max SEQ1_MAX] [--seq2_max SEQ2_MAX]
[--seq1_sep SEQ1_SEP] [--seq2_sep SEQ2_SEP] [--s1s2_sep S1S2_SEP]
[--eps EPS] [--skip SKIP] [--seq1in_sep SEQ1IN_SEP] [--seq2in_sep SEQ2IN_SEP]
[--s1s2_delim S1S2_DELIM] [--iter ITER] --ifile IFILE --ofile OFILE
--seq1_del, Allow deletions in sequence 1. Defaults to false.
--seq2_del, Allow deletions in sequence 2. Defaults to false.
--seq1_max SEQ1_MAX, Maximum subsequence length for sequence 1. Defaults to 2.
--seq2_max SEQ2_MAX, Maximum subsequence length for sequence 2. Defaults to 2.
--seq1_sep SEQ1_SEP, Separator token for sequence 1. Defaults to '|'.
--seq2_sep SEQ2_SEP, Separator token for sequence 2. Defaults to '|'.
--s1s2_sep S1S2_SEP, Separator token for seq1 and seq2 alignments. Defaults to '}'.
--eps EPS, Epsilon symbol. Defaults to ''.
--skip SKIP, Skip/null symbol. Defaults to '_'.
--seq1in_sep SEQ1IN_SEP, Separator for seq1 in the input training file. Defaults to ''.
--seq2in_sep SEQ2IN_SEP, Separator for seq2 in the input training file. Defaults to ' '.
--s1s2_delim S1S2_DELIM, Separator for seq1/seq2 in the input training file. Defaults to ' '.
--iter ITER, Maximum number of iterations for EM. Defaults to 10.
--ifile IFILE, File containing sequences to be aligned.
--ofile OFILE, Write the alignments to file.
$
The two required options are the pronunciation dictionary to align (IFILE) and the file in which the aligned corpus will be saved (OFILE). The script provide default values for all other options and cmudict (v. 0.7a) can be aligned simply by the following command
$ ./align --seq1_del --seq2_del --ifile
allowing for deletions in both graphemes and phonemes
and
$ ./align --ifile
not allowing for deletions
3. Performance
In order to test the new alignment script's performance in both its results and its requirements for cpu and memory usage, I have performed two tests for aligning of the full cmudict (v. 0.7a) allowing deletions in both sequenses:
$ ./align --seq1_del --seq2_del --ifile ../data/cmudict.dict --ofile ../data/cmudict.corpus.gsoc
and compared with the original phonetisuarus script
$ ./m2m-aligner.py --align ../train/data/cmudict.dict -s2 -s1 --write_align ../train/data/cmudict.corpus
3.1. Alignment
Comparing of the two outputs using the linux diff util, didn't result in major differences. Minor differences were noticed in case of the alignment double vowels and consonants with a single phoneme, as in the two following examples:
--- cmudict.corpus
+++ cmudict.corpus.gsoc
....
-B}B O}AO1 R}_ R}R I}IH0 S}S
+B}B O}AO1 R}R R}_ I}IH0 S}S
....
-B}B O}AO1 S|C}SH H}_ E}_ E}IY0
+B}B O}AO1 S|C}SH H}_ E}IY0 E}_
....
3.2. CPU memory usage
In the system described above, the average (of two runs) running time for new aligned command was 1h:14m in comparison to an average of 1h:28m of the original phonetisaurus script. Both scripts consumed the same RAM amount (~ 1.7GB).
Conclusion – Future works
This article presented the new g2p align script which seems to produce the same results as the original one and is a little bit faster than that.
Although it should compile and run as expected to any modern linux system, further testing is reeuired for other systems (like MacOSX, windows). We need also to investigated the alignment differnces (compared to the original script) in the vowels and consonants as described above. Although it doesn't seem critical, it may cause problems later.
References
[1] Phonetisaurus: A WFST-driven Phoneticizer – Framework Review
[2] OpenFst Library, http://www.openfst.org/twiki/bin/view/FST/WebHome
Currently, my goal is to build a web interface which allows users to evaluate their pronunciation. Some of the sub-tasks have already been accomplished, and some of them are still ongoing:
Work accomplished:
Google Summer of Code 2012 officially started this Monday (21 May). Our expected weekly report should begin next Monday, but here is a brief overview of the preparations we have accomplished during the "community bonding period."
Here are the preparations I have accomplished during the bonding period:
My GSoC Project Page: http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/troylee2008/1
(author: John Salatas)
Foreword
This article, the second in a series regarding, porting openFST to java, briefly presents some additional base classes and raise some issues regarding the java fst architecture in general and its compatibility with the original openFST binary format for saving models.
1. FST java library base architecture
The first article in the series [1] introduced the Weight
Furthermore revision 11363 includes the edu.cmu.sphinx.fst.state.State
2. Architecture design issues
2.1. Java generics support
As described in the first part [1], the edu.cmu.sphinx.fst.weight.Weight
As an example the Arc class definition would be simplified to
public class Arc implements Serializable{
private static final long serialVersionUID = -7996802366816336109L;
// Arc's weight
protected Weight weight;
// Rest of the code.....
}
instead of its current definition
public class Arc
private static final long serialVersionUID = -7996802366816336109L;
// Arc's weight
protected W weight;
// Rest of the code.....
}
The proposed modification can be applied also to State and Fst classes and provide an easier to use api. In that case the construction of a basic FST in the class edu.cmu.sphinx.fst.demos.basic.FstTest would be simplified as follows
// ...
Fst fst = new Fst();
// State 0
State s = new State();
s.AddArc(new Arc(new Weight(0.5), 1, 1, 1));
s.AddArc(new Arc(new Weight(1.5), 2, 2, 1));
fst.AddState(s);
// State 1
s = new State();
s.AddArc(new Arc(new Weight(2.5), 3, 3, 2));
fst.AddState(s);
// State 2 (final)
s = new State(new Weight(3.5));
fst.AddState(s);
// ...
The code could be further simplified by completely dropping generics support in State, Arc and Fst classes by just providing solid implementations based on Weight weights.
2.2. Compatibility with the original openFST binary format
A second issue is the compatibility of the serialized binary format with the original openFST format. A compatible java library that is able to load/save openFST models, would provide us the ability to share trained models between various applications. As an example, in the case of ASR appliactions, trained models could be easily shared between between sphinx4 and kaldi [3] which is written in C++ and already uses the openFST library.
2.3. Logarithmic Semiring implementation issues
A final issue has to do with a possible inconsistency of the plus operation definition between Allauzen's et. Al paper [4] and the actual openFST code (version 1.3.1.): The plus operation ( $latex \oplus_{\log} $ ) is defined in [4] as $latex x \oplus_{\log} y = -\log(e^{-x} +e^{-y}) $, however in code it is implemented as follows
template
inline T LogExp(T x) { return log(1.0F + exp(-x)); }
template
inline LogWeightTpl Plus(const LogWeightTpl &w1,
const LogWeightTpl &w2) {
T f1 = w1.Value(), f2 = w2.Value();
if (f1 == FloatLimits::kPosInfinity)
return w2;
else if (f2 == FloatLimits::kPosInfinity)
return w1;
else if (f1 > f2)
return LogWeightTpl(f2 - LogExp(f1 - f2));
else
return LogWeightTpl(f1 - LogExp(f2 - f1));
}
References
[1] “Porting openFST to java: Part 1”, last accessed: 18/05/2012.
[2] CMUSphinx g2p SVN repository
[3] Kaldi Speech recognition research toolkit , last accessed: 18/05/2012.
[4] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri, “OpenFst: a general and efficient weighted finite-state transducer library”, Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA 2007), pp. 11–23, Prague, Czech Republic, July 2007.