CMUSphinx Powers Video Subtitle Editing Tool

Making subtitles from scratch usually consists of two tedious tasks: (1) figuring out the times when someone starts and ends speaking — in subtitle length pieces — and (2) typing down text corresponding to that speech. An approach often worth attempting is to automate a large part of this work by using speech recognition to generate subtitles from given video. This method cannot be expected to produce release quality subtitles on its own, but it should provide a rough first draft, which can be finished by usual manual methods. With most video sources, the actual speech recognition cannot be expected to perform well, but voice recognition should provide decent results for the start and end times of subtitles.

Gaupol's speech recognition uses the excellent CMU Sphinx speech recognition toolkit developed at Carnegie Mellon University — to be exact, the pocketsphinx plugin for the GStreamer multimedia framework.

Check it out