CMUCLMTK Development

Check out the code:

The repository also contains binaries for Windows and Linux. Binaries are stored by OS, in bin/x86-nt/ and bin/x86-linux/.

Configure and compile in linux:

./ && make && make install

Optionally you can use options to pass to autogen, for example --prefix=/somewhere

Configure and compile for Windows:

You should do your development underCygwin since the makefiles are autobuild-oriented. Be sure to download the developer and mingw components. You want to compile with mingw since this will give you Windows-native binaries and will therefore be portable from computer to computer.

aclocal && autoheader && automake --add-missing --copy && autoconf
./configure --enable-mingw CFLAGS="-g -Wall -O0" CXXFLAGS="-g -Wall -O0"
make && make install

For Development on Linux

Insure that binaries are created on both Linux and Windows platforms.

  • build with autoconf
  • commit
  • do an update from a windows box with cygwin
  • compile under cygwin
  • commit again

For Development on Windows

  • build with cygwin
  • commit
  • do an update on a linux box
  • build with autoconf
  • commit again

If you’re working on a filesystem shared by linux and windows then you can skip the middle commit and update steps.

Future Release Plans

There has not been an official release of the toolkit since the one put out by Cambridge. We intend to make one very soon. Since the toolkit itself has not been fundamentally updated aside from the 32-bit word ID change, this can be considered version 2.1 of the toolkit. Nonetheless, there will be many significant updates:

  • Perl scripts for easy language model construction from various source texts (DONE)

  • Dictionary building support for English using Festival (DONE)

  • Chinese segmentation

  • Chinese pronunciation generation

Successive releases will contain more fundamental changes to the toolkit.
Mainly, we intend to add support for modified Kneser-Ney smoothing.