Mozilla Announces WebSpeech API

One of the main problems with existing open source speech recognition systems is that they are not really designed to be used in end-user software. They are mainly research projects created by universities and they are intended to support new research. They allow to quickly add with new features and get best results for various evaluations.

The end-user software doesn't work like that, you might not need to demonstrate the best accuracy but you need to match the user expectations. For example, user expects to get a reasonable result even if he speaks far from the microphone or whispers the words. No modern system can recognize whisper reliably, thus mismatched expectation, thus complains. A lot of work is required to solve all the problems like this.

However, since many commercial companies promoted speech recognition to end-users, open source software also got a chance. We can build software for mass-market and match commercial solutions in terms of accuracy and robustness. Important step here would be to gain the audience attention, instead of software for geeks we need to become a software for everybody, a very hard problem to solve. It's great there are projects with big ambitions here, in particular Mozilla Foundation.

Recently Mozilla Foundation announced a project to support WebSpeech HTML API in their browsers. Celebrating 10 years of Firefox development, Mozilla CTO, Andreas Gal, announced this and many other features coming in Mozilla codebase. During Google Summer of Code project by Andre Natal a base system was implemented and Andre continues work on the project. You can get some ideas of where it is now and how it developers from the following post. So we will probably be able to see speech interface in Firefox browser and Firefox OS pretty soon.

One of the main issue in wide adoption of the speech interfaces would be the support for big and small languages. Firefox considers this as important direction of development, in particular support for Indian languages. I hope we are going to see a lot of progress here.

Pocketvox is listening you!

We are pleased to announce a release of Pocketvox, a voice control project for Linux desktop.

What is Pocketvox ?

Pocketvox is a desktop application and a development library written in C and created by Benoit Franquet, a french newly graduated engineer, visually impaired and passionated by desktop accessibility and application development.

Pocketvox is based on several well known open source libraries such as Sphinxbase, Pocketsphinx, GStreamer, GLib-2.0, GObject-2.0, Gtk+, ... from the GNOME project, Espeak to make Pocketvox respond with voice and Xlib to detect the focused window.

Pocketvox comes with several tools in order to make both developers and desktop users able to create or use voice controlled applications.

Pocketvox for desktop users

For the desktop, Pocketvox comes with:

- A menu launcher to launch the pocketvox application
- A panel applet to manage Pocketvox
- A configuration GUI and its launcher in the menu
- A common way to store the configuration using GSettinsgs
- A way to choose the micro input
- A way to send commands over TCP
- An activation keyword in order to start actions when you say a specific word
- A very flexible way to manage modules

Pocketvox for developers

For developers, Pocketvox provides:

- A .pc file to develop with Pocketvox
- A Python interface
- A very easy way to design your own module in Python

Python interface has been built using the GObject introspection, a very basic example is available on the Github page of the project.

Some addons Pocketvox comes with

Pocketvox has been build in order to use a system of module. Why ? Because, this allow desktop users to define and create very rapidly new modules only by writing a dictionnary file with a very simple structure like this:

open my documents=xdg-open ~/Documents
open my images=xdg-open ~/Images

Moreover, thanks to this structure users are able to associate a specific application to a module. When pocketvox is running then it will detect the focused application and execute commands listed in the module's dictionnary file.

Besides somes bash scripts have been integrated to the Pocketvox project in order to make users able to rapidly create a custom language model using the cmuclmtk toolkit.

Pocketvox is working out of the box using language models, acoustic models and dictionaries available on our website.

Pocketvox is ready for translation and already available in French and English. All steps to translate it are available on the Pocketvox repository on Github. Pocketvox is waiting for you to make it available in other language.

How to get Pocketvox ?

You can find all informations to try Pocketvox on the Github's page of the project.

The first release has been published yesterday, so you can get it here.

A way to make the world accessible

We recently got information about CDRV project, a great effort to make the word more accessible.

CDRV (“Device Controller by Speech Recognition” in spanish) is a device which can control a lift chair using voice commands. The purpose of this project is to help people with mobility problems such as disabilities or old people who are not able to use their hands to control the lift chair. The hardware of the CDRV consists on an overclocked Raspberry Pi model B and an extension board developed to control the motors of the lift chair, a Wi-Fi dongle, an external manual control and a Logitech USB microphone. The operative system used has been developed with Buildroot and runs in RAM completely so it is possible to turn off the power without shutting down the OS. The extension board has been developed with KiCAD and the plastic box has been designed entirely with OpenSCAD and printed with a Prusa i3 RepRap. The offline speech recognition software that runs into the Raspberry Pi is the C implementation of CMUSphinx: pocketsphinx. It uses the spanish acoustic model provided by the CMUSphinx community and a slightly modified version of the application pocketsphinx_continuous to perform a command and control task. A language model with approximately 25 common words behaves like a garbage model. No kind of confidence measures are computed.

CDRV is continuously listening for the utterance “Lift Chair Activation” (“Activación Sillón”, actually). Once this command is recognized, it produces a confirmation beep. Then, it waits for some seconds searching for the actions “Up” (Sube) or “Down” (Baja). If any of these actions is recognized, the lift chair will be activated accordingly. If not, a desactivation beep will be emitted. Anytime, the command “Stop” (Para) will turn off the lift chair motors.

The device has been tested with real users in real situations achieving very good results. Some exceptional users, due to it’s particular phonetics, are not properly recognized but in general, the device reacts only when the correct words are pronounced even with a close loud television. In the video below you can find a short demonstration of the device.

This project is being developed for a non profit organization called CVI (Center of independent living). The non profit ONCE foundation is funding the project and the UPC university provides tech support.

Pocketsphinx Ruby Is available on Github

We are pleased to announce the availability of the Ruby bindings for pocketsphinx created by Howard Wilson.


pocketsphinx-ruby is a high-level Ruby wrapper for the pocketsphinx C API. It uses the Ruby Foreign Function Interface (FFI) to directly load and call functions in libpocketsphinx, as well as libsphinxad for recoding live audio using a number of different audio backends.

The goal of the project is to make it as easy as possible for the Ruby community to experiment with speech recognition, in particular for use in grammar-based command and control applications. Setting up a real time recognizer is as simple as:

configuration = Pocketsphinx::Configuration::Grammar.new do
  sentence "Go forward ten meters"
  sentence "Go backward ten meters"
end

Pocketsphinx::LiveSpeechRecognizer.new(configuration).recognize do |speech|
  puts speech
end

This library supports Ruby MRI 1.9.3+, JRuby, and Rubinius. It depends on the current development versions of Pocketsphinx and Sphinxbase - there are Homebrew recipes available for a quick start on OSX.