OpenEars, speech library for IPhone using CMUSphinx


We are very pleased to see the ongoing progress on OpenEars. Please consider

http://www.politepix.com/openears/

OpenEars is an iOS library for continuous, multithreaded speech recognition and text-to-speech using CMU Pocketsphinx and CMU Flite, for use in iPhone and iPad development. OpenEars can:

• Do continuous speech recognition on a managed background thread that uses less than 10% CPU on average on an iPhone 3G while listening (decoding and text-to-speech use more CPU),
• Quickly suspend and resume continuous recognition on demand,
• Choose between 8 Flite voices for text-to-speech using a simple config file,
• Suspend recognition during Flite speech automatically when using the external speaker,
• Make use of a Cocoa-standard static library project, allowing SDK and architecture re-targeting from the application project,
• Do management and notification of the state of the Audio Session to handle microphone changes and interruptions like incoming calls,
• Return input/output decibel metering of the audio functions so it is ready for your UI.
• Let you use these features via Objective-C methods.

Please report any bugs at http://www.politepix.com/forums/

Since you probably urge to know more, we asked the OpenEars developer, Halle Winkler a few questions. Halle is a professional developer specializing in software development for the iPhone, iPad, and iPod Touch, as well as UX design, with an emphasis on usability and the emerging interaction possibilities of multitouch platforms.

Q: How are you going to expand this?

Halle: I'm going to see what users ask for, but my guess is that they will want in-app lm/dic generation or a RESTful API for creating new lm/dic files. Other features that I would consider would be switching lmsets on the fly in the course of the listening loop, or maybe an API for managing a logical tree of different outcomes from commands, which was definitely the most headache-causing aspect of AllEars to test. I'm very interested to see what people do with it and what they tell me they want. I'm not ready to publish any kind of roadmap yet.

Oh, something I definitely need to do for an upcoming version is improve the responsiveness/threading/CPU overhead for Flite processing and speech playback. I need to use a lower-level audio API for Flite and get Flite streaming working. On an iPhone 4 some of the voices can generate a sentence in a third of a second, which is impressive and definitely not in the range that is getting unresponsive from a UX perspective, but on my iPhone 3G it can take a second and a half plus the latency of the audio API I'm using, which is getting into "is anything happening?" UX territory (I know that is still an impressive speed given the CPU, but for endusers it's confusing).

Q: What are the most important issues users complain about?

Halle: There are no complains yet, but I'm would be glad to hear any feedback from OpenEars users.

Q: What are most important pocketsphinx issues you've met?

Halle: Well, when I was originally using Pocketsphinx, a big frustration was the build_for_iphone.sh method of creating static libraries because it often didn't work for me since I often don't have my developer tools installed to the default location (which seems to be required by the script), and once I got it working I ended up having to make or copy 12 different static libraries in order to be able to target 3 different SDK versions while I was experimenting. Then in the middle of it, Apple shipped a beta iOS4 SDK that wiped out those libraries with its installer, which has nothing to do with Pocketsphinx but was a time-killer to figure out what had happened, which is the point at which I made a new method for linking to your libraries.

OpenEars ships with Cocoa static library projects for Pocketsphinx and Sphinxbase which are linked via cross-project references with the user's app project, so when they want to target the simulator versus the device, or target to one SDK but deploy with backwards-compatibility to an earlier SDK, the Cocoa static library project just gets that information passively from the main project and recompiles itself at the build time for the user's app project.

In general I think Pocketsphinx is fantastic! It runs really well on the iPhone in continuous mode. More documentation is always good. I tried to err on the side of overkill for the OpenEars docs since I think the topic of speech recognition can be very complex to first get into as an outsider, when you're just expecting that you'll compile the library and suddenly you'll have a device that understands an entire language's worth of vocabulary with no trouble, but actually there are lms, dics, hmms, all the arguments that you can run pocketsphinx with, etc. Easing developers into some complexities that would benefit them (or me) to understand on a more fundamental level while encapsulating other complexities that might not need to be grappled with in a run-of-the-mill speech app seems like the challenge.