CMUSphinx Documentation

This page contains collaboratively developed documentation for the CMU Sphinx speech recognition engines.

Beginner User Documentation

This section contains links to documents which describe how to use Sphinx to recognize speech. Currently, we have very little in the way of end-user tools, so it may be a bit sparse for the forseeable future.

Tutorial: Getting started with CMUSphinx for developers

You are in trouble - read the FAQ

Advanced User Documentation

These documents either describe some particular aspect of the Sphinx codebase in detail, or they serve as a developer’s guide to accomplishing some particular task.

Building: Building Pocketsphinx on various platforms
AsteriskDetails: How to use pocketsphinx in Asterisk.
DecoderTuning: How to tune the decoder to be fast (or rather, not horribly slow)
PocketsphinxHandhelds Pocketsphinx optimizations for embedded devices, same as above for Pocketsphinx.
PhonemeRecognition: How to use pocketsphinx for phoneme recognition.
SpeakerDiarization: Using LIUM tools for speech segmentation and speaker diarization
LDAMLLT: How to train acoustic models with LDA and MLLT feature transforms
GStreamer: How to use PocketSphinx with GStreamer and Python
InstallingPythonStuff: How to install Python and necessary modules for SphinxTrain development
MMIE_Train: How to perform MMIE training.
Installing on Raspberry Pi

How To Contribute

Please consider project ideas ProjectIdeas, some of them are easy, some harder. If you want to start work on any of them, please let us know.

Reference

These documents describe the excruciating detail of APIs, or provide other useful background information for CMUSphinx developers.

Developer Documentation

This section contains various internal information for CMUSphinx developers. But we hope it will be still usable for you.

Sphinx-4 Regression Tests: How to run regression tests
SphinxTrainWalkthrough: An overview of the SphinxTrain source code for researchers and developers
CMUCLMTK development: Development guide for the CMU-Cambridge Language Modeling Toolkit.
CodingStyle for SphinxBase, SphinxThree, and SphinxTrain
ReleaseSchedule: Plans for upcoming releases of Sphinx
Release Check List: How to make a release
Web Site Layout: How to organize information
Sphinx4 Space : Information about sphinx4, design, code, performance, history.

File formats

Data sources

Available data sources are covered on the page SpeechData

Speech Recognition Theory

This section tries to collect research ideas for specific problems in speech recognition