LexTreeLinguist (sphinx4-core 5prealpha-SNAPSHOT API)

java.lang.Object
- edu.cmu.sphinx.linguist.lextree.LexTreeLinguist

All Implemented Interfaces:

Linguist, Configurable
```
public class LexTreeLinguist
extends java.lang.Object
implements Linguist
```
A linguist that can represent large vocabularies efficiently. This class implements the Linguist interface. The main role of any linguist is to represent the search space for the decoder. The initial state in the search space can be retrieved by a SearchManager via a call to getInitialSearchState. This method returns a SearchState. Successor states can be retrieved via calls to SearchState.getSuccessors().. There are a number of search state sub-interfaces that are used to indicate different types of states in the search space:
- WordSearchState - represents a word in the search space.
- UnitSearchState - represents a unit in the search space
- HMMSearchState represents an HMM state in the search space
A linguist has a great deal of latitude about the order in which it returns states. For instance a 'flat' linguist may return a WordState at the beginning of a word, while a 'tree' linguist may return WordStates at the ending of a word. Likewise, a linguist may omit certain state types completely (such as a unit state). Some Search Managers may want to know a priori the order in which states will be generated by the linguist. The method getSearchStateOrder can be used to retrieve the order of state returned by the linguist.
Depending on the vocabulary size and topology, the search space represented by the linguist may include a very large number of states. Some linguists will generate the search states dynamically, that is, the object representing a particular state in the search space is not created until it is needed by the SearchManager. SearchManagers often need to be able to determine if a particular state has been entered before by comparing states. Because SearchStates may be generated dynamically, the SearchState.equals() call (as opposed to the reference equals '==' method) should be used to determine if states are equal. The states returned by the linguist will generally provide very efficient implementations of equals and hashCode. This will allow a SearchManager to maintain collections of states in HashMaps efficiently.
LexTeeLinguist Characteristics
Some characteristics of this linguist:
- Dynamic - the linguist generates search states on the fly, greatly reducing the required memory footprint
- tree topology this linguist represents the search space as an inverted tree. Units near the roots of word are shared among many different words. These reduces the amount of states that need to be considered during the search.
- HMM sharing - because of state tying in the acoustic models, it is often the case that triphone units that differ in the right context actually are represented by the same HMM. This linguist recognizes this case and will use a single state to represent the HMM instead of two states. This can greatly reduce the number of states generated by the linguist.
- Small-footprint - this linguist uses a few other techniques to reduce the overall footprint of the search space. One technique that is particularly helpful is to share the end word units (where the largest fanout of states occurs) across all of the words. For a 60K word vocabulary, these can result in a reduction in tree nodes of about 2 million to around 3,000.
- Quick loading - this linguist can compile the search space very quickly. A 60K word vocabulary can be made ready in less than 10 seconds.
This linguist is not a general purpose linguist. It does impose some constraints:
- unit size - this linguist will units that are no larger than triphones.
- n-gram grammars - this linguist will generate the search space directly from the N-Gram language model. The vocabulary supported is the intersection of the words found in the language model and the words that exist in the Dictionary. It is assumed that all sequences of words in the vocabulary are valid. This linguist doesn't support arbitrary grammars.
Design Notes The following are some notes describing the design of this linguist. They may be helpful to those who want to understand how this linguist works but are not necessary if you are only interested in using this linguist.
Search Space Representation It has been shown that representing the search space as a tree can greatly reduce the number of active states in a search since the units at the beginnings of words can be shared across multiple words. For example, with a large vocabulary (60K words), at the end of a word, with a flat representation, we have to provide transitions to the initial state of each possible word. That is 60K transitions. In a tree based system we need to only provide transitions to each initial phone (within its context). That is about 1600 transitions. This is a substantial reduction. Conceptually, this tree consists of a node for each possible initial unit. Each node can have an arbitrary number of children which can be either unit nodes or word nodes.
This linguist uses the HMMTree class to build and represent the tree. The HMMTree is given the dictionary and language model and builds the lextree. Instead of representing the nodes in the tree as phonemes and words as is typically done, the HMMTree represents the tree as HMMs and words. The HMM is essentially a unit within its context. This is typically a triphone (although for some units (such as SIL) it is a simple phone. Representing the nodes as HMM instead of nodes yields a much larger tree, but also has some advantages:
- Because of state-tying in the acoustic models, many distinct triphones actually share an HMM. Representing the nodes as HMMs allows these shared HMMs to be represented in the tree only once instead of many times if we representing states as phones or triphones. This leads to a reduction in the actual number of states that are considered during a search. Experiments have shown that this can reduce the required beam by a factor of 2 or 3.
- By representing the nodes as HMM, we avoid having to lookup the HMM for a particular triphone during the search. This is a modest savings.
There are some disadvantages in representing the tree with HMMs:
- size since HMMs represent units in their context, we have many more copies of each node. For instance, instead of having a single unit representing the initial 'd' in the word 'dog' we would have about 40 HMMs, one for each possible left context.
- speed building the much larger HMM tree can take much more time, since many more nodes are needed to represent the tree.
- complexity representing the tree with HMMs is more complex. There are multiple entry points for each word/unit that have to be dealt with.
Luckily the size and speed issues can be mitigated (by adding a bit more complexity of course). The bulk of the nodes in the HMM tree are the word ending nodes. There is a word ending node for each possible right context. To reduce space, all of the word ending nodes are replaced by a single EndNode. During the search, the actual HMM nodes for a particular EndNode are generated on request. These sets of HMM nodes can be shared among different word endings, and therefore are cached. The effect of using this EndNode optimization is to reduce the space required by the tree by about 300mb and the time required to generate the tree from about 60 seconds to about 6 seconds.
Word Histories
We use explicit backoff for word histories. That technique is proven to be useful and save number of states. The reasoning is the following. With a vocabulary of size N, you have N^2 unique bigram histories. So the token stack will have N^2*K unique tokens, where K is the number of states per token. For a 100k vocab, 3 states per HMM, that will be 3*10^10 tokens (max). Of course, a large majority of them will be pruned, but really, its still way too much. If you stick with the actual K-gram used (i.e. accounting explicitly for backoff), then this reduces tremendously. Most bigrams dont have corresponding trigrams. Not all 10^10 bigrams have trigrams. We only need to store as many explicit tokens as the number of bigrams that have trigrams.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`class`	`LexTreeLinguist.LexTreeEndUnitState` Represents a unit in the search space
`class`	`LexTreeLinguist.LexTreeEndWordState` Represents the final end of utterance word
`class`	`LexTreeLinguist.LexTreeHMMState` Represents a HMM state in the search space
`class`	`LexTreeLinguist.LexTreeNonEmittingHMMState` Represents a non emitting hmm state
`class`	`LexTreeLinguist.LexTreeUnitState` Represents a unit in the search space
`class`	`LexTreeLinguist.LexTreeWordState` Represents a word state in the search space

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`addFillerWords`
`protected edu.cmu.sphinx.linguist.lextree.HMMTree`	`hmmTree`
`protected float`	`languageWeight`
`static java.lang.String`	`PROP_ACOUSTIC_MODEL` The property that defines the acoustic model to use when building the search graph
`static java.lang.String`	`PROP_ADD_FILLER_WORDS` The property that controls whether filler words are automatically added to the vocabulary
`static java.lang.String`	`PROP_CACHE_SIZE` The property that defines the size of the arc cache (zero to disable the cache).
`static java.lang.String`	`PROP_DICTIONARY` The property that defines the dictionary to use for this grammar
`static java.lang.String`	`PROP_FULL_WORD_HISTORIES` The property that determines whether or not full word histories are used to determine when two states are equal.
`static java.lang.String`	`PROP_GENERATE_UNIT_STATES` The property to control whether or not the linguist will generate unit states.
`static java.lang.String`	`PROP_GRAMMAR` The property that defines the grammar to use when building the search graph
`static java.lang.String`	`PROP_LANGUAGE_MODEL` The property for the language model to be used by this grammar
`static java.lang.String`	`PROP_UNIGRAM_SMEAR_WEIGHT` The property that determines the weight of the smear.
`static java.lang.String`	`PROP_UNIT_MANAGER` The property that defines the unit manager to use when building the search graph
`static java.lang.String`	`PROP_WANT_UNIGRAM_SMEAR` The property that determines whether or not unigram probabilities are smeared through the lextree.

Fields inherited from interface edu.cmu.sphinx.linguist.Linguist
PROP_FILLER_INSERTION_PROBABILITY, PROP_LANGUAGE_WEIGHT, PROP_SILENCE_INSERTION_PROBABILITY, PROP_UNIT_INSERTION_PROBABILITY, PROP_WORD_INSERTION_PROBABILITY

Constructor Summary

Constructors
Constructor and Description
`LexTreeLinguist()`
`LexTreeLinguist(AcousticModel acousticModel, UnitManager unitManager, LanguageModel languageModel, Dictionary dictionary, boolean fullWordHistories, boolean wantUnigramSmear, double wordInsertionProbability, double silenceInsertionProbability, double fillerInsertionProbability, double unitInsertionProbability, float languageWeight, boolean addFillerWords, boolean generateUnitStates, float unigramSmearWeight, int maxArcCacheSize)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`allocate()` Allocates the linguist.
`void`	`deallocate()` Deallocates the linguist.
`protected void`	`generateHmmTree()`
`Dictionary`	`getDictionary()`
`LanguageModel`	`getLanguageModel()` Retrieves the language model for this linguist
`SearchGraph`	`getSearchGraph()` Retrieves search graph.
`void`	`newProperties(PropertySheet ps)` This method is called when this configurable component needs to be reconfigured.
`void`	`startRecognition()` Called before a recognition
`void`	`stopRecognition()` Called after a recognition

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - PROP_GRAMMAR
```
@S4Component(type=Grammar.class)
public static final java.lang.String PROP_GRAMMAR
```
    The property that defines the grammar to use when building the search graph
    
    See Also:
    
    Constant Field Values
  - PROP_ACOUSTIC_MODEL
```
@S4Component(type=AcousticModel.class)
public static final java.lang.String PROP_ACOUSTIC_MODEL
```
    The property that defines the acoustic model to use when building the search graph
    
    See Also:
    
    Constant Field Values
  - PROP_UNIT_MANAGER
```
@S4Component(type=UnitManager.class,
             defaultClass=UnitManager.class)
public static final java.lang.String PROP_UNIT_MANAGER
```
    The property that defines the unit manager to use when building the search graph
    
    See Also:
    
    Constant Field Values
  - PROP_FULL_WORD_HISTORIES
```
@S4Boolean(defaultValue=true)
public static final java.lang.String PROP_FULL_WORD_HISTORIES
```
    The property that determines whether or not full word histories are used to determine when two states are equal.
    
    See Also:
    
    Constant Field Values
  - PROP_LANGUAGE_MODEL
```
@S4Component(type=LanguageModel.class)
public static final java.lang.String PROP_LANGUAGE_MODEL
```
    The property for the language model to be used by this grammar
    
    See Also:
    
    Constant Field Values
  - PROP_DICTIONARY
```
@S4Component(type=Dictionary.class)
public static final java.lang.String PROP_DICTIONARY
```
    The property that defines the dictionary to use for this grammar
    
    See Also:
    
    Constant Field Values
  - PROP_CACHE_SIZE
```
@S4Integer(defaultValue=0)
public static final java.lang.String PROP_CACHE_SIZE
```
    The property that defines the size of the arc cache (zero to disable the cache).
    
    See Also:
    
    Constant Field Values
  - PROP_ADD_FILLER_WORDS
```
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_ADD_FILLER_WORDS
```
    The property that controls whether filler words are automatically added to the vocabulary
    
    See Also:
    
    Constant Field Values
  - PROP_GENERATE_UNIT_STATES
```
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_GENERATE_UNIT_STATES
```
    The property to control whether or not the linguist will generate unit states. When this property is false the linguist may omit UnitSearchState states. For some search algorithms this will allow for a faster search with more compact results.
    
    See Also:
    
    Constant Field Values
  - PROP_WANT_UNIGRAM_SMEAR
```
@S4Boolean(defaultValue=true)
public static final java.lang.String PROP_WANT_UNIGRAM_SMEAR
```
    The property that determines whether or not unigram probabilities are smeared through the lextree. During the expansion of the tree the language probability could be only calculated when we reach word end node. Until that point we need to keep path alive and give it some language probability. See Alleva, F., Huang, X. and Hwang, M.-Y., "Improvements on the pronunciation prefix tree search organization", Proceedings of ICASSP, pp. 133-136, Atlanta, GA, 1996. for the description of this technique.
    
    See Also:
    
    Constant Field Values
  - PROP_UNIGRAM_SMEAR_WEIGHT
```
@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_UNIGRAM_SMEAR_WEIGHT
```
    The property that determines the weight of the smear. See PROP_WANT_UNIGRAM_SMEAR
    
    See Also:
    
    Constant Field Values
  - addFillerWords
```
protected boolean addFillerWords
```
  - languageWeight
```
protected float languageWeight
```
  - hmmTree
```
protected edu.cmu.sphinx.linguist.lextree.HMMTree hmmTree
```
- Constructor Detail
  - LexTreeLinguist
```
public LexTreeLinguist(AcousticModel acousticModel,
                       UnitManager unitManager,
                       LanguageModel languageModel,
                       Dictionary dictionary,
                       boolean fullWordHistories,
                       boolean wantUnigramSmear,
                       double wordInsertionProbability,
                       double silenceInsertionProbability,
                       double fillerInsertionProbability,
                       double unitInsertionProbability,
                       float languageWeight,
                       boolean addFillerWords,
                       boolean generateUnitStates,
                       float unigramSmearWeight,
                       int maxArcCacheSize)
```
  - LexTreeLinguist
```
public LexTreeLinguist()
```
- Method Detail
  - newProperties
```
public void newProperties(PropertySheet ps)
                   throws PropertyException
```
    Description copied from interface: Configurable
    
    This method is called when this configurable component needs to be reconfigured.
    
    Specified by:
    
    newProperties in interface Configurable
    
    Parameters:
    
    ps - a property sheet holding the new data
    
    Throws:
    
    PropertyException - if there is a problem with the properties.
  - allocate
```
public void allocate()
              throws java.io.IOException
```
    Description copied from interface: Linguist
    
    Allocates the linguist. Resources allocated by the linguist are allocated here. This method may take many seconds to complete depending upon the linguist.
    Implementor's Note - A well written linguist will allow allocate to be called multiple times without harm. This will allow a linguist to be shared by multiple search managers.
    
    Specified by:
    
    allocate in interface Linguist
    
    Throws:
    
    java.io.IOException - if an IO error occurs
  - deallocate
```
public void deallocate()
                throws java.io.IOException
```
    Description copied from interface: Linguist
    
    Deallocates the linguist. Any resources allocated by this linguist are released.
    Implementor's Note - if the linguist is being shared by multiple searches, the deallocate should only actually deallocate things when the last call to deallocate is made. Two approaches for dealing with this:
    (1) Keep an allocation counter that is incremented during allocate and decremented during deallocate. Only when the counter reaches zero should the actually deallocation be performed.
    (2) Do nothing in dellocate - just the the GC take care of things
    
    Specified by:
    
    deallocate in interface Linguist
    
    Throws:
    
    java.io.IOException - if an IO error occurs
  - getSearchGraph
```
public SearchGraph getSearchGraph()
```
    Description copied from interface: Linguist
    
    Retrieves search graph. The search graph represents the search space to be used to guide the search.
    Implementor's note: This method is typically called at the beginning of each recognition and therefore should be
    
    Specified by:
    
    getSearchGraph in interface Linguist
    
    Returns:
    
    the search graph
  - startRecognition
```
public void startRecognition()
```
    Called before a recognition
    
    Specified by:
    
    startRecognition in interface Linguist
  - stopRecognition
```
public void stopRecognition()
```
    Called after a recognition
    
    Specified by:
    
    stopRecognition in interface Linguist
  - getLanguageModel
```
public LanguageModel getLanguageModel()
```
    Retrieves the language model for this linguist
    
    Returns:
    
    the language model (or null if there is none)
  - getDictionary
```
public Dictionary getDictionary()
```
  - generateHmmTree
```
protected void generateHmmTree()
```

Class LexTreeLinguist

Nested Class Summary

Field Summary

Fields inherited from interface edu.cmu.sphinx.linguist.Linguist

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

PROP_GRAMMAR

PROP_ACOUSTIC_MODEL

PROP_UNIT_MANAGER

PROP_FULL_WORD_HISTORIES

PROP_LANGUAGE_MODEL

PROP_DICTIONARY

PROP_CACHE_SIZE

PROP_ADD_FILLER_WORDS

PROP_GENERATE_UNIT_STATES

PROP_WANT_UNIGRAM_SMEAR

PROP_UNIGRAM_SMEAR_WEIGHT

addFillerWords

languageWeight

hmmTree

Constructor Detail

LexTreeLinguist

LexTreeLinguist

Method Detail

newProperties

allocate

deallocate

getSearchGraph

startRecognition

stopRecognition

getLanguageModel

getDictionary

generateHmmTree