edit-distance grammar decoding using sphinx3: Part 2


(Author: Srikanth Ronanki)
(Status: GSoC 2012 Pronunciation Evaluation Week 4)

The source code for the functions below [1] have been uploaded to http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/branches/speecheval/ronanki/scripts/
Here are some brief notes on how to use those programs:

Method 1: (phoneme decode)
Path:
neighborphones_decode/one_phoneme/
Steps To Run:
1. Use split_wav2phoneme.py to split a sample wav file in to individual phoneme wav files
$ python split_wav2phoneme.py
2. Create split.ctl file using extracted split_wav directory
$ ls split_wav/* > split.ctl
$ sed -i 's/.wav//g' split.ctl

3. Run feature_extract.sh program to extract features for individual phoneme wav files
$ sh feature_extract.sh
4. Java Speech Grammar Format (JSGF) files are already created in FSG_phoneme
5. Run jsgf2fsg.sh in FSG_phoneme to convert from jsgf to fsg.
$ sh jsgf2fsg.sh
6. Run decode_1phoneme.py to get the required output in output_decoded_phones.txt
$ python decode_1phoneme.py

Method 2: (Three phones decode)
Path:

neighborphones_decode/three_phones/
Steps To Run:
1. Use split_wav2threephones.py to split a sample wav file in to individual phoneme wav files which consists of three phones the other two being served as contextual information for the middle one.
$ python split_wav2threephones.py
2. Create split.ctl file using extracted split_wav directory
$ ls split_wav/* > split.ctl
$ sed -i 's/.wav//g' split.ctl

3. Run feature_extract.sh program to extract features for individual phoneme wav files
$ sh feature_extract.sh
4. Java Speech Grammar Format (JSGF) files are already created in FSG_phoneme
5. Run jsgf2fsg.sh in FSG_phoneme to convert from jsgf to fsg
$ sh jsgf2fsg.sh
6. Run decode_3phones.py to get the required output in output_decoded_phones.txt
$ python decode_3phones.py

Method 3: (Single/Batch phrase decode)
Path:

neighborphones_decode/phrases/
Steps To Run:
1. Construct grammar file (JSGF) using my earlier scripts from phonemes2ngbphones [2] and then use jsgf2fsg in sphinxbase to convert from JSGF to FSG which serves as input Language Model to sphinx3_decode
2. Provide the input arguments such as grammar file, feats, acoustic models etc., for the input test phrase
3. Run decode.sh program to get the required output in sample.out
$ sh decode.sh

References:

[1] edit-distance grammar decoding using sphinx3: Part 1

[2] Input string of phonemes to CMUBet neighboring phones