There is also an updated release of SphinxTrain, and the acoustic modeling tutorial has been updated to reflect the new and simplified usage. Still working on the other tutorials, sorry.
To quote the release notes, this release fixes a few long-standing bugs in SphinxTrain and makes the package (hopefully) easier to use. Among other things:
- The dependency on SphinxBase is gone, because SphinxBase is gone
- The dependency on Sphinx3 for VTLN and force-alignment is gone (
sphinx3_align
is included) - Multi-CPU training actually works, tested on up to 64 CPUs with LibriSpeech, much easier than setting up PBS on the Clown
- The dependency on Visual Studio for buliding on Windows is gone (but please just use WSL, please)
- The dependency on Autotools is gone (CMake ain’t great but it’s much less bad)
- There is a Dockerfile now
- There is “continuous integration” now (sort of)
- The
-remove_silence
option has been disabled by default (unlike in PocketSphinx you can still turn it on if you really want to, it might save you a bit of time in training) - It is not necessary to install SphinxTrain system-wide to run training
- G2P support has been updated for the most commonly installed version of OpenFST (do not try to use any other version, because C++, that’s why)
- Pick Decoding Model Based on Context Dependence by @Mazyod
- Output an error message when we cannot execute a tool by @cshung
- Make an option in config for not folding case in phonemes by @lenzo-ka
- Use consistent shebang for python by @acgrobman
- Add -sox flag to sphinx_fe to convert files with SoX by @dhdaines
- Update and enable G2P code by @dhdaines
- Librispeech training template by @dhdaines
You can download it from the release page
Or clone it (shallowly) with git
:
git clone --depth 1 --branch v5.0.0 https://github.com/cmusphinx/sphinxtrain
Pull requests and bug reports and such are welcome via https://github.com/cmusphinx/sphinxbase.