We recently got information about CDRV project, a great effort to make the word more accessible.
CDRV (“Device Controller by Speech Recognition” in spanish) is a device which can control a lift chair using voice commands. The purpose of this project is to help people with mobility problems such as disabilities or old people who are not able to use their hands to control the lift chair. The hardware of the CDRV consists on an overclocked Raspberry Pi model B and an extension board developed to control the motors of the lift chair, a Wi-Fi dongle, an external manual control and a Logitech USB microphone. The operative system used has been developed with Buildroot and runs in RAM completely so it is possible to turn off the power without shutting down the OS. The extension board has been developed with KiCAD and the plastic box has been designed entirely with OpenSCAD and printed with a Prusa i3 RepRap. The offline speech recognition software that runs into the Raspberry Pi is the C implementation of CMUSphinx: pocketsphinx. It uses the spanish acoustic model provided by the CMUSphinx community and a slightly modified version of the application pocketsphinx_continuous to perform a command and control task. A language model with approximately 25 common words behaves like a garbage model. No kind of confidence measures are computed.
CDRV is continuously listening for the utterance “Lift Chair Activation” (“Activación Sillón”, actually). Once this command is recognized, it produces a confirmation beep. Then, it waits for some seconds searching for the actions “Up” (Sube) or “Down” (Baja). If any of these actions is recognized, the lift chair will be activated accordingly. If not, a desactivation beep will be emitted. Anytime, the command “Stop” (Para) will turn off the lift chair motors.
The device has been tested with real users in real situations achieving very good results. Some exceptional users, due to it’s particular phonetics, are not properly recognized but in general, the device reacts only when the correct words are pronounced even with a close loud television. In the video below you can find a short demonstration of the device.
This project is being developed for a non profit organization called CVI (Center of independent living). The non profit ONCE foundation is funding the project and the UPC university provides tech support.
We are pleased to announce the availability of the Ruby bindings for pocketsphinx created by Howard Wilson.
pocketsphinx-ruby is a high-level Ruby wrapper for the pocketsphinx C API. It uses the Ruby Foreign Function Interface (FFI) to directly load and call functions in libpocketsphinx, as well as libsphinxad for recoding live audio using a number of different audio backends.
The goal of the project is to make it as easy as possible for the Ruby community to experiment with speech recognition, in particular for use in grammar-based command and control applications. Setting up a real time recognizer is as simple as:
configuration = Pocketsphinx::Configuration::Grammar.new do sentence "Go forward ten meters" sentence "Go backward ten meters" end Pocketsphinx::LiveSpeechRecognizer.new(configuration).recognize do |speech| puts speech end
This library supports Ruby MRI 1.9.3+, JRuby, and Rubinius. It depends on the current development versions of Pocketsphinx and Sphinxbase - there are Homebrew recipes available for a quick start on OSX.
LIUM team, the main CMUSphinx contributor, has announced today the release of TEDLIUM corpus version2, an amazing database prepared from transcribed TED talks
A details on this update could be found in corresponding publication:
A. Rousseau, P. Deléglise, and Y. Estève, "Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks", in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), May 2014.
This database of 200 hours of speech allows you to build a speech recognition system with very good performance with open source toolkits like Kaldi or CMUSphinx. A Kaldi recipe for TEDLIUM v1, is available in the repository and we hope that the update to TEDLIUM v2 will be available soon.
Modern technology like automatic alignment of transcribed audio made it easy to create very competitive databases, so it's easy to predict that the size of the available databases will quickly grow to thousands of hours and thus we will see a very significant improvement in accuracy of the open source recognition. The problem comes here that quite powerful training clusters will be required to work with such databases, it is not possible to train model on a single server in acceptable amount of time.
Microsoft traditionally has very good speech recognition technology. Recently announced speech recognition assistant Cortana is one of the best available assistant. However, it might lack support for your native language or just behave not the way you expect (hey, Siri also still doesn't support many languages).
Thanks to a wonderful work by Toine de Boer you can now enjoy Pocketsphinx on Windows phone platform. It is as straightforward as on Android, you can just download the project from our github http://github.com/cmusphinx/pocketsphinx-wp-demo, import it into your Visual Studio and run on the phone. You can enjoy all the features of CMUSphinx on Windows phone: continuous hands-free operation, switchable grammars, support for custom acoustic and language models. There is no need to wait for the speech recognition input in the game. We hope this opens the possibilities for new great applications.
The demo includes continuous listening for the keyphrase "oh mighty computer" and once keyphrase is detected it switches to grammar mode to let you input some information. Let us know how it works.