CMUSphinx Tutorial For Developers

Introduction

This tutorial is going to describe some applications of the CMUSphinx toolkit. Such applications could include voice control of mobile, desktop or automotive applications, language learning, speech transcription, closed captioning, speech translation, or voice search. While all of these applications are possible with CMUSphinx, modern toolkits such as Kaldi, Coqui, NeMo, Wav2vec2, Whisper and whisper.cpp, etc, etc, will perform much, much better on larger vocabulary tasks.

The tutorial is intended for developers who need to apply speech technology in their applications, not for speech recognition researchers. If you are a researcher, it’s recommended to start with a textbook on speech technologies. Spoken Language Processing by Acero, Huang and others is a good choice for that.

The structure of this tutorial is the following:

Basic concepts of speech recognition
Overview of the CMUSphinx toolkit
Before you start
Building an application with sphinx4
Building an application with pocketsphinx
Using PocketSphinx on Android
Building a dictionary
Building a language model
Adapting an existing acoustic model
Training an acoustic model
Tuning the performance