CMUSphinx Open Source Speech Recognition

Jun 6, 2011

Long Audio Training: Update 2

The past-days-work addressed the usage of restrictive function read_line in the SphinxTrain. All occurrences of read_line were eliminated making use of line iterators from sphinxbase. During this a decission to modify the lineiter interface was made in order to support original read_line functionality, e.g. comments skipping and whitespace trimming. This now takes following methods:

lineiter_init - init iterator for reading without any preprocessing
lineiter_init_clean - init iterator for reading compatible with the read_line function: skip commented lines and trim leading and trailing whitespaces
lineiter_next - read next line from the file
lineiter_free - finish reading and free resources

Usages of line iterators in the sphinxbase were also updated to comply with the interface modifications. These changes were necessary to enable SphinxTrain training on audio files of unlimited size and the lineiter is meant to be central input interface in the SphinxTrain in the future.

Work on the examination of memory issues is ongoing and will be followed by the implementation of memory-optimized Baum-Welch algorithm as described at https://cmusphinx.github.io/wiki/longaudiotraining as well as finding other ways to reduce unreasonable memory demands of current SphinxTrain version.

May 31, 2011

Long Audio Alignment : Week 2

As indicated in my last update regarding Long Audio Alignment project (https://cmusphinx.github.io/2011/05/long-audio-alignment-week-1/), my attempt this week was to fix the problem of generating pronunciations for out of vocabulary (OOV) words. Due to lack of a reliable Java based Automata library, I added an existing phone generator from FreeTTS to Sphinx 4 to generate pronunciation hypothesis for OOV words.
The module for generating these hypothesis is modeled to ensure correct pronunciation hypothesis for:

Abbreviations: Words like "USD" in transcription could have a equivalent utterance as both as United States Dollar and U-S-D. Accuracy of this depends on the model used.
Numbers : "123" in text can have a equivalent utterance as "One Two Three" as well as "One Hundred Twenty Three"
OOV words that are neither abbreviation nor numbers are pronounced as it is ( i.e. default pronunciation generated by FreeTTS).

A branch long-audio-alignment has been created on SVN. Source files for this project can be found there.

With the current state of the Aligner model (i.e. with pronunciations for all words in the dictionary), the word error rate (WER) was found to have improved to 0.16 as compared to 0.18 without the pronunciation generator.

After a few quick experiments with grammars again, we next aim to model anchors based on trigram language model.

May 30, 2011

Long Audio Training: Update 1

Here comes the first update on the Long Audio Training project. It's aim is to enable SphinxTrain training on recordings hours long. Presently the SphinxTrain can process files up to 3 minutes approx.

Full info on the project can be found at https://cmusphinx.github.io/wiki/longaudiotraining.

During the last week a collection of audio files 5 to 10 minutes long have been turned into CMUSphinx training database in order to determine possible issues when training on longer recordings. The first experiments resulted in two main findings:

Some of the components of SphinxTrain put an arbitrary limit to the size of data they process. E.g. the problem is with the function read_line which reads a line only up to the constant size of a buffer. This implementation resulted in crashes of training process in the baum-welch step due to the failed word lookup.
Another finding is that the training process currently requires a huge amout of memory. This takes about 1.7GB of RAM when trainning on ~5 minutes-long recording and much over 4GB processing ~10 minutes-long input. (I did not determine the actual value because this actually shot down my machine.) This indicates there is a flaw in the memory management of the training process and will be subject to examination in the following days.

A branch long-audio-training was created in the CMUSphinx SVN repository. All work done on this project will be commited into this branch.

First change is the fix for the word-lookup problem which was caused by the truncation of transcription sentence and thus it's last word. The idea is to replace all usages of read_line in SphinxTrain with the lineiter_* set of functions from sphinxbase, which do not impose any such limits on the length of the data.

More update to come soon, stay tuned!

May 26, 2011

Building Pocketsphinx On Android

Update:

Windows users, please check more or less updated manual here:

https://sites.google.com/site/opiatefuchs/home/pocketsphinxandroiddemo

I know you waited for the instruction for a long time, here it is. Cudos to Matthew Cooper who wrote this. These instruction should work on any GNU/Linux distributuion, for example on Fedora or Ubuntu. This is also written for phones have OS 2.2 or higher. Previous OS requires a different path to the sdcard, so it will be easy to adapt this guide.

Download and build pocketsphinx

To run pocketsphinx download the following from https://cmusphinx.github.io/wiki/download/

sphinxbase-0.8
pocketsphinx-0.8

Then download PocketSphinxDemo archive from
PocketsphinxAndroidDemo - snapshot

Unzip all three folders into a place where you will remember to find them. It is necessary to unzip both pocketsphinx and sphinxbase folders into the same parent directory and rename them to just "sphinxbase" and "pocketsphinx" without version information.

Open up the terminal and type:

sudo -i

followed by your password. This will give you root access.

You will need swig later, so let's install it. You need swig 1.3, for now we do not support newer swig like 2.0:

apt-get install swig

yum install swig

On Windows install swig according to the following document:

http://www.swig.org/Doc1.3/Windows.html

Now, cd into sphinxbase and run the following commands from command line:

./configure make make install

cd into pocketsphinx and type:

./configure make make install

On Windows you do not need to compile Sphinxbase, just unpack it.

Now cd into PocketSphinxDemo/jni folder

Open the Android.mk file, found in the jni folder, and change the SPHINX_PATH(line #5) to the parent folder holding pocketsphinx and sphinxbase.

from command line type:

the-path-to-your-ndk-folder/ndk-build -B

Of course, substitude the real path to your ndk folder for the-path-to-your-ndk-folder

Eclipse

Now open Eclipse and import the PocketSphinxDemo.

In the Navigator View look for PocketSphinxDemo project. Right click on it and select properties. The properties screen will pop up and you will need to select Builders. In the Builders screen you will see SWIG and NDK build.

Click on NDK build and edit.

In the edit screen change the field Location to point to your ndk-folder you have on your machine. Click on the Refresh tab and select "The project containing the selected resource"
Click on the Build Options tab and deselect "Specify working set of relevant resources"
Apply changes and exit the configuration for NDK build.

Click on SWIG and edit.

You will not need to change the Location since downloaded swig at the start of the tutorial.
In the refresh tab select "The folder containing the selected resource"
In the Build Options tab deselect "Specifiy working set of relevant resources"
Apply changes and exit the configuration for SWIG.

Phone

Connect to your android phone and create the edu.cmu.pocketsphinx folder at

/mnt/sdcard

You can do this by opening terminal command and typing:

adb shell

In shell type:

mkdir /mnt/sdcard/edu.cmu.pocketsphinx

now cd into the edu.cmu.pocketsphinx folder that is located on your phone and create the following folder structure:

edu.cmu.pocketsphinx | ----> hmm | | | ----> en_US | | | ----> hub4wsj_sc_8k | ----> lm | -----> en_US

Now type quit to leave adb shell.

While still in terminal you will need to push files from your computer onto the phone.
cd into pocketsphinx/model/hmm/en_US/ and type:

adb push ./hub4wsj_sc_8k /mnt/sdcard/edu.cmu.pocketsphinx/hmm/en_US/hub4wsj_sc_8k

Now cd into pocketsphinx/model/lm/ in your and type:

adb push ./en_US /mnt/sdcard/edu.cmu.pocketsphinx/lm/en_US

Now open the RecognizerTask.java found in
/src/edu/cmu/pocketsphinx/demo

There are declared paths to a structure that is not valid on a 2.2 phone. We will need to change the paths so that they work correctly. Here is my code for the section.

pocketsphinx.setLogfile("/mnt/sdcard/edu.cmu.pocketsphinx/pocketsphinx.log"); Config c = new Config(); /* * In 2.2 and above we can use getExternalFilesDir() or whatever it's called */ c.setString("-hmm", "/mnt/sdcard/edu.cmu.pocketsphinx/hmm/en_US/hub4wsj_sc_8k"); c.setString("-dict", "/mnt/sdcard/edu.cmu.pocketsphinx/lm/en_US/hub4.5000.dic"); c.setString("-lm", "/mnt/sdcard/edu.cmu.pocketsphinx/lm/en_US/hub4.5000.DMP"); c.setString("-rawlogdir", "/mnt/sdcard/edu.cmu.pocketsphinx"); // Only use it to store the audio

If your model is different, you will have to use different paths, for example fo rthe mandarin model
c.setString("-hmm", "/sdcard/Android/data/edu.cmu.pocketsphinx/hmm/zh/tdt_sc_8k"); c.setString("-dict", "/sdcard/Android/data/edu.cmu.pocketsphinx/lm/zh_TW/mandarin_notone.dic"); c.setString("-lm", "/sdcard/Android/data/edu.cmu.pocketsphinx/lm/zh_TW/gigatdt.5000.DMP");

Now build and run the project.

Newer

Older

Page 28 of 37