CMUSphinx Tutorial For Developers
Introduction
This tutorial is going to describe some applications of the CMUSphinx toolkit. Such applications could include voice control of mobile, desktop or automotive applications, language learning, speech transcription, closed captioning, speech translation, or voice search. While all of these applications are possible with CMUSphinx, modern toolkits such as Kaldi, Coqui, NeMo, Wav2vec2, Whisper and whisper.cpp, etc, etc, will perform much, much better on larger vocabulary tasks.
The tutorial is intended for developers who need to apply speech technology in their applications, not for speech recognition researchers. If you are a researcher, it’s recommended to start with a textbook on speech technologies. Spoken Language Processing by Acero, Huang and others is a good choice for that.
The structure of this tutorial is the following:
- Basic concepts of speech recognition
- Overview of the CMUSphinx toolkit
- Before you start
- Building an application with sphinx4
- Building an application with pocketsphinx
- Using PocketSphinx on Android
- Building a dictionary
- Building a language model
- Adapting an existing acoustic model
- Training an acoustic model
- Tuning the performance