Long Audio Alignment : Week 3


Last update indicated an error rate of almost 16 % for sufficiently large audio files. Experiments were conducted to pin-point the source of these errors.  It was then suggested to align audio without classifying speech and non-speech components of the audio.  Alignment with such configuration is now being tested with different grammars for different sorts of possible errors in the audio and/or transcription.

Audio and it's perfect transcription for up to 20 minutes long utterances have been checked for alignment with the current state of aligner, and the resulted in close 0% word error rate. The grammar used for this sort of  alignment only allows transitions from word to it's immediate successor in the transcription.

We are currently classifying different sorts of errors in transcription and utterance , and modelling grammar to allow alignment for the same.