GSOC 2012 Accepted Projects Announced


We are happy to announce a list of students which will participate in Google Summer Of Code 2012 project with CMUSphinx organization:

Letter to Phoneme Conversion in sphinx4

Task

Currently sphinx4 can only work with predefined dictionary. It's possible to build phonetic dictionary automatically but it requires both application of machine learning for training and development of decoder module as well as testing. Various language modules needs to be trained as well. This work will be implement letter to sound rules with OpenFST in sphinx4.

Student John Salatas

Pronunciation Evaluation

Task

Implement the simple reading and pronunciation learning system

Students

Srikanth Ronanki and Troy Lee

Semantic language model

Current language models are very basic that means they don't really understand what's transcribed. That affects error rate. Create a decoder over the lattices that will select semantically correct path and create a perfectly readable result.

Student

Wencan Luo

Postprocessing punctuation and capitalization framework

Create language-independent postprocessing framework that will turn ASR results into something readable with punctuation, abbreviations and capitalization.

http://www.makapa.de/Paulik_Sent_ICASSP08.pdf

Student

Alexandru-Dan Tomescu

Web Data Collection For Language Modeling

Write a crawler which can collect text data for language model training on certain topic

Student

Emre Çelikten

We expect great features implemented this summer. Please stay tuned, the news will appear here.