Long Audio Training - reduced B-W computation & move towards CUDA – CMUSphinx Open Source Speech Recognition

It's been a while since my last post. In theese days I was modifying the Baum-Welch algorithm to the reduced version, which is finally complete.

Forward, backward and Viterbi methods were changed in the following way:

'Reduced' forward method was created. This method computes the checkpoints for later re-computation of actual alpha & scaling values. The size of reduced matrices is a function of block size, which is taken as a parameter.
'Local' forward method was created. This method performs the alpha values re-computation for a particular checkpoint (block of values).
As SphinxTrain has Viterbi back-pointers computation embedded in the Forward pass, the modification of Viterbi was just to use the reduced forward and to recompute the alpha values with the local forward.
Backward update was modified in a similar way as the Viterbi.

The modification was successfully tested on an4 database. It performed somewhat slower, which was anticipated, as the modified algorithm does more computation.

I also tried the modification on the 'rita' (long audio) database. I was forced to quit the computation as it took all my system's memory. This sadly seems as no improvement in the memory demands and might suggest that some of the memory demands are not in the forward/backward/Viterbi as well as that I might have just introduced some memory leaks. During the brief tests the block_size parameter was set arbitrarily to 11, not the sqrt function of time frames count, which also may have some performance consequences.

The actual slow-down and memory requirements are subject to more detailed tests.

Regarding the CUDA, I have gain access to 3 CUDA machines. Two of them belong to Sitola, The Laboratory of Advanced Network Technologies. The access to these machines is provided by MetaCentrum, Czech academic grid organization providing the computation and storage resources. The cards are GeForce GTX 285, GeForce 8800 Ultra and GeForce 8400M GS (a rather low-end one in my personal laptop). These are devices the CUDA development and testing will take place on. More info to come, please check the project page.