Processing Speech Recognition Results With Wit.AI

The biggest challenge for developers today is a natural user interface. People already use gesture and speech to interact with their PCs and devices; such natural ways to interact with technologies make it easier to learn how to operate them. Biggest companies like Microsoft and Intel are putting a lot of effort into research in natural interaction.

CMUSphinx is a critical component of the open source infrastructure for creating natural user interfaces. However, it is not the only one component required to build an application. One of the most frequently asked questions are - how do I analyze speech recognition output to turn it into actionable information. The answer is not simple, again, it is all about a complex NLP technology which you can apply to analyze user intent as well as a dataset to help you with analysis.

In simple cases you can just parse the number strings to turn them into values, you can apply regex pattern matching to extract the name of the object to act upon. In Sphinx4 there exist a technology which can parse grammar output to assign semantic values in user request. In general, this is more complex task.

Recently, a Wit.AI has announced the availability of their NLP technology for developers. If you are looking for a simple technology to create a natural language interface, Wit.AI seems to be a good thing to try. Today, with the combination of the best engines like CMUSphinx and Wit, you can finally bring the power of voice to your app.

You can build a NLP analysis engine with Wit.AI in three simple stages:

  1. Provide a few examples of the responses you expect.
  2. Send raw user input to the API. You get structured information in return.
  3. Wit learns from usage and helps you improve your configuration.

Bringing natural language understanding to the masses of developers is really a hard problem and we great that tools appear to simplify the solution.

Pocketsphinx Python bindings ported from Cython to SWIG

As of today a large change of using SWIG-generated python bindings has been merged into pocketsphinx and sphinxbase trunk.

SWIG is an interface compiler that connects programs written in C and C++ with scripting languages such as Perl, Python, Ruby, and Tcl. It works by taking the declarations found in C/C++ header files and using them to generate the wrapper code that scripting languages need to access the underlying C/C++ code. In addition, SWIG provides a variety of customization features that let you tailor the wrapping process to suit your application.

By this port we hope to increase coverage of pocketsphinx bindings and provide a uniform and documented interface in various language: Python, Ruby, Java.

To test the change checkout sphinxbase and pocketsphinx from trunk and see the examples in pocketsphinx/swig/python/test.

Open Source Dictation is Coming

It is an old idea to implement an open source dictation tool everyone could use. Without servers, networking, without the need to share your private speech with someone else. This is certainly not a trivial project which was started many times, but it's something really world-changing. Now, it's live again, powered by CMUSphinx.

Consider details about ongoing efforts of Simon project to implement open source dictation.

Voice-enable Your Website With CMUSphinx

It has been a long dream to voice-enable websites. However, no good technology existed for this either because speech recognition on the web required a connection to a server or due to the requirement to install binary plugin.

Great news is that you can now use CMUSphinx in any modern browser completely on the client side. No need for installation, no need to maintain voice recognition server farm. This is a really cool technology.

Sylvain Chevalier has been working on a port of Pocketsphinx to JavaScript using emscripten. Combined with the Web Audio API, it works great as a real-time recognizer for web applications, running entirely in the browser, without plug-in.

It's on Github (,
comments, suggestions and contributions are more than welcome!