Summer Of Code 2017 Organization Application

This is the plan for our application to the Google Summer of Code 2017

Please see also:

Describe your organization

The CMUSphinx project is a leading automatic speech recognition project in the open source world. Since being released as open source code in 1999, it provides a platform for building speech recognition applications. It’s used in desktop control software, telephony platforms, intelligent houses, computer-assisted language learning tools, information retrieval and mobile applications. Traditionally, CMUSphinx provides support for low-resource and underdeveloped languages.

Over its long history, the project has been supported by CMU, SUN, MERL, LIUM and many other organizations. Thousands of students use CMUSphinx in their studies to learn state-of-the art in machine learning algorithms. CMUSphinx has been the base for more than 20 PhD theses. CMUSphinx has vibrant and active community of developers lead by experienced speech researchers with many years of domain experience. CMUSphinx has been mainly a complex research platform for evaluation of the speech recognition systems. Now, CMUSphinx is aiming at the end user. Our goal is to bring open source speech recognition from the universities to every computer, and this moment is getting closer every day.

This year, the CMUSphinx project hopes to build on computer aided instruction for speaking skills based on work done in GSoC 2012 with a new authentic intelligibility remediation paradigm which has only recently emerged in automation pronunciation assessment for language learning. In the past CMUSphinx has targeted big data systems and algorithms for our GSoC proposals. Developments have also been focused on grid-scale computing and large scale unsupervised learning.

Summarize your involvement and the successes and challenges of your

participation. Please also list your pass/fail rate for each year

Overall the participation is great experience both for mentors and for students. The project allowed us to focus our efforts, explore new opportunities, increase communication between the team and, of course, get new fresh developers. We highly value the publicity which project gained due to participation. Several students were able to find the job and study opportunities in CMUSphinx-related universities due to their participation.

We started to participate in the project in 2010. In 2010 we were unfortunately rejected. In 2011 we were accepted and successfully completed two projects. In 2012 we had 6 students. That year was so successful that we did not have sufficient interest from mentors to apply again until this year.

Over six years we implemented several major important projects which are now a core of the CMUSphinx toolkit. Last year, for example, we were able to integrate an amazing FST framework we would never have resource to integrate otherwise. We also highly value the chance to work in the team on the focused project which mobilized our resources.

The challenges were common and learned to solve them quickly. The hardest issue is to maintain the communication between the student and the project, keep track of the student work an be able to stick the student after the summer is over. Last year we accidentally booked too many slots, 6 students were impossible to manage and we had to fail one of them.

Summary per year:

  • 2010: project rejected

  • 2011: 2 students 2 passed. 1 stayed with the project afterwards

  • 2012: 6 students 5 passed. 1 stayed with the project afterwards, and another is returning as a hopeful mentor this year

  • 2013-2016: CMUSphinx did not participate in GSoC

Why is your organization applying to participate in Google Summer of Code

2017? What do you hope to gain by participating?

We really like the process: teaching, communicating, learning from the students. We want to do that again and again. It’s like a breath of fresh air which allows us to survive. We really hope that the program can connect us to the greatest students around and sure we will be able to pay it back with the high-quality projects.

Why is your organization applying to participate in Google Summer of Code

2017? What do you hope to gain by participating?

CMUSphinx is a leading automatic speech recognition project in the open source world. We have a big userbase and growing community, but no other event we could participate has the energy of the Summer of Code.

We would like to be a part of the most successful and most famous open source event of the year. We were lucky to have that opportunity before and it was amazing to get in touch with new people, make them interested in the project, enable them to work on cool things and empower our community. Last years we were able to build a great team of students, project developers and senior mentors and thus we created a basement for the future development for the years ahead.

We want to do it again!

What criteria did you use to select your mentors for this year’s program?

Please be as specific as possible

We do not need a selection process, our mentoring team is stabilized over years and we know each other. The same people who mentored before will participate this time. We will not take any strangers from the outside as a mentors because it appeared to be not very good experience in the previous years.

How many potential mentors do you have for this year’s program? What

criteria did you use to select them?

We have two potential mentors willing to participate, and at least three more who may participate depending on student interest. Most are actively contributing to the project and we work together on day-to-day basis. We will not accept anyone outside since we had a negative experience with such mentors before. Mentors must be tightly aligned with the project goals, understand the project infrastructure and code. Another valuable quality of the mentor is teaching experience, ability to share their knowledge and passion for the project goals.

What is your plan for dealing with disappearing students?

Over last participation in GSoC we received a good experience on how to work with the students. There will be weekly team meetings, progress reports, irc discussions, and public announcements. We’ll track our students as closely as possible, and will terminate disappeared students after 2 missed reports. We understand the issue, in particular we aware about students disappearing after the first payment. Out goal is to make students involved and engaged to such a degree that we can be sure they will not disappear.

What is your plan for dealing with disappearing mentors?

We decided to have a strict policy of 2 mentors per student this year and we will not demand more slots than this quota will allow us. Organization administrators will also be able to work with the student if needed.

What steps will you take to encourage students to interact with your

project’s community before and during the program?

Each student communication will happen through a single channel and will be closely tracked by both the mentors and admin. There won’t be any case where a student won’t have anyone to talk to. We want maintain a very high level of communication across our project team, and therefore it will not be possible to stay away from this. We will ensure the following:

  • Students communicate with mentors on daily basis;

  • Students plan, implement and promote their projects through the community portal;

  • Students participate in community activities - bug fixing, patch review, IRC support;

  • Students position themselves as a part of the community;

  • Students never loose focus;

  • Students always learn something new.

What will you do to encourage that your accepted students stick with the

project after Google Summer of Code concludes?

The field of automatic speech recognition involves not only coding, but a research part. We are going to suggest students to continue their projects as a thesis or a course work and will offer our help to those who decide to proceed. Of course we will do our best to assist students in accepting their work into the upstream branch, and thus they will be able to mention their contribution during the job application process. We believe that theirs valuable contributions to our project will be highly evaluated by employers.