PocketSphinx  5.0.0 A small speech recognizer
PocketSphinx API Documentation

Version
5.0.0
Date
October 5, 2022

# Introduction

This is the documentation for the PocketSphinx speech recognition engine. The main API calls are documented in ps_decoder_t and ps_config_t. The organization of this document is not optimal due to the limitations of Doxygen, so if you know of a better tool for documenting object-oriented interfaces in C, please let me know.

# Installation

To install from source, you will need a C compiler and a recent version of CMake. If you wish to use an integrated development environment, Visual Studio Code will automate most of this process for you once you have installed C++ and CMake support as described at https://code.visualstudio.com/docs/languages/cpp

The easiest way to program PocketSphinx is with the Python module. See http://pocketsphinx.readthedocs.io/ for installation and usage instructions.

## Unix-like systems

From the top-level source directory, use CMake to generate a build directory:

cmake -S . -B build


Now you can compile and run the tests, and install the code:

cmake --build build
cmake --build build --target check
cmake --build build --target install


By default CMake will try to install things in /usr/local, which you might not have access to. If you want to install somewhere else you need to set CMAKE_INSTALL_PREFIX when running cmake for the first time, for example:



## There is literally no output!

If by this you mean it doesn't spew copious logging output like it used to, you can solve this by passing -loglevel INFO on the command-line, or setting the loglevel parameter to "INFO", or calling err_set_loglevel() with ERR_INFO.

If you mean that you just don't have any recognition result, you may have forgotten to configure a dictionary. Or see below for other reasons the output could be blank.

## Why doesn't my audio device work?

Because it's an audio device. They don't work, at least for things other than making annoying "beep boop" noises and playing Video Games. More generally, I cannot solve this problem for you, because every single computer, operating system, sound card, microphone, phase of the moon, and day of the week is different when it comes to recording audio. That's why I suggest you use SoX, because (a) it usually works, and (b) whoever wrote it seems to have retired long ago, so you can't bother them.

## The recognized text is wrong.

That's not a question! But since this isn't Jeopardy, and my name is not Watson, I'll try to answer it anyway. Be aware that the answer depends on many things, first and foremost what you mean by "wrong".

If it sounds the same, e.g. "wreck a nice beach" when you said "recognize speech" then the issue is that the language model is not appropriate for the task, domain, dialect, or whatever it is you're trying to recognize. You may wish to consider writing a JSGF grammar and using it instead of the default language model (with the jsgf parameter). Or you can get an N-best list or word lattice and rescore it with a better language model, such as a recurrent neural network or a human being.

If it is total nonsense, or if it is just blank, or if it's the same word repeated, e.g. "a a a a a a", then there is likely a problem with the input audio. The sampling rate could be wrong, or even if it's correct, you may have narrow-band data. Try to look at the spectrogram (Audacity can show you this) and see if it looks empty or flat below the frequency in the upperf parameter. Alternately it could just be very noisy. In particular, if the noise consists of other people talking, automatic speech recognition will nearly always fail.

## Why don't you support (pick one or more: WFST, fMLLR, SAT, DNN, CTC, LAS, CNN, RNN, LSTM, etc)?

Not because there's anything wrong with those things (except LAS, which is kind of a dumb idea) but simply because PocketSphinx does not do them, or anything like them, and there is no point in adding them to it when other systems exist. Many of them are also heavily dependent on distasteful and wasteful platforms like C++, CUDA, TensorFlow, PyTorch, and so on.

# Acknowledgements

PocketSphinx was originally released by David Huggins-Daines, but is largely based on the previous Sphinx-II and Sphinx-III systems, developed by a large number of contributors at Carnegie Mellon University, and released as open source under a BSD-like license thanks to Kevin Lenzo. For some time, it was maintained by Nickolay Shmyrev and others at Alpha Cephei, Inc. See the AUTHORS file for a list of contributors.