Speech projects on GSOC 2014


Google summer of code is definitely one of the largest projects in open source world. 1400 students will enjoy the participation in the open source projects during this summer. Four projects of the pool are dedicated to speech recognition and it is really amazing all of them are planning to use CMUSphinx!

Here is the list of new hot projects for you to track and participate:

Speech to Text Enhancement Engine for Apache Stanbol
Student: Suman Saurabh
Organization: Apache Software Foundation
Assigned mentors: Andreas Kuckartz

Enhancement engine uses Sphinix library to convert the captured audio. Media (audio/video) data file is parsed with the ContentItem and formatted to proper audio format by Xuggler libraries. Audio speech is than extracted by Sphinix to 'plain/text' with the annotation of temporal position of the extracted text. Sphinix uses acoustic model and language model to map the utterances with the text, so the engine will also provide support of uploading acoustic model and language model.

Development of both online as offline speech recognition to B2G and Firefox

Student:Andre Natal
Organization: Mozilla
Assigned mentors: Guilherme Gonçalves
Short description: Mozilla needs to fill the gap between B2G and other mobile OSes, and also desktop Firefox lacks this important feature already available at Google Chrome. In addition, we’ll have a new Web API empowering developers , and every speech recognition application already developed and running on Chrome, will start to work on Firefox without changes. On future, this can be integrated on other Mozilla products, opening windows to a whole new class of interactive applications.

I know Andre very well, he is a very talented person, so I'm sure this project will be a huge success. Between, you can track it in github repository too: https://github.com/andrenatal/speechrtc

Sugar Listens - Speech Recognition within the Sugar Learning Platform

Student: Rodrigo Parra
Organization: Sugar Labs
Assigned mentors: tchx84
Short description: Sugar Listens seeks to provide an easy-to-use speech recognition API to educational content developers, within the Sugar Learning Platform. This will allow developers to integrate speech-enabled interfaces to their Sugar Activities, letting users interact with Sugar through voice commands. This goal will be achieved by exposing the open-source speech recognition engine Pocketsphinx as a D-Bus service.

Integrate Plasma Media Center with Simon to make navigation easier

Student: Ashish Madeti
Organization: KDE
Assigned mentors: Peter Grasch, Shantanu Tushar
Short description: User can currently navigate with keyboard and mouse in Plasma Media Center. Now, I will add Voice as a way for a user to navigate and use PMC. This will be done by integrating PMC with Simon.

I know Simon has a large and successful history of GSOC participation, so this project is also going to be very interesting.

Also, this summer we are going to run few student internships unrelated to GSOC, it's going to be very interesting too, stay tuned!