A new English language model is available (updated) for download on our new Torrent tracker.
This is a good trigram language model for a general transcription trained on a various open sources, for example Guttenberg texts.
It archives the good transcription performance on various types of
texts, for example on the following tests sets the perplexities are:
Perplexity: 158.3
Perplexity: 206.677
Beside the transcription task, this model should be significantly better on conversational data like movie transcription.
The language model was pruned with a beam 5e-9 to reduce the model. It can be pruned further if needed or a vocabulary could be reduced to fit the target domain.