Linguist (sphinx4-core 5prealpha-SNAPSHOT API)

All Superinterfaces:

Configurable

All Known Implementing Classes:

AFlatLinguist, AllphoneLinguist, DynamicFlatLinguist, FlatLinguist, LexTreeLinguist
```
public interface Linguist
extends Configurable
```
The linguist is responsible for representing and managing the search space for the decoder. The role of the linguist is to provide, upon request, the search graph that is to be used by the decoder. The linguist is a generic interface that provides language model services.
The main role of any linguist is to represent the search space for the decoder. The search space can be retrieved by a SearchManager via a call to getSearchGraph. This method returns a SearchGraph. The initial state in the search graph can be retrieved via a call to getInitialState Successor states can be retrieved via calls to SearchState.getSuccessors().. There are a number of search state subinterfaces that are used to indicate different types of states in the search space:
- WordSearchState - represents a word in the search space.
- UnitSearchState - represents a unit in the search space
- HMMSearchState represents an HMM state in the search space
A linguist has a great deal of latitude about the order in which it returns states. For instance a 'flat' linguist may return a WordState at the beginning of a word, while a 'tree' linguist may return WordStates at the ending of a word. Likewise, a linguist may omit certain state types completely (such as a unit state). Some Search Managers may want to know a priori the order in which different state types will be generated by the linguist. The method SearchGraph.getNumStateOrder() can be used to retrieve the number of state types that will be returned by the linguist. The method SearchState.getOrder() returns the ranking for a particular state.
Depending on the vocabulary size and topology, the search space represented by the linguist may include a very large number of states. Some linguists will generate the search states dynamically, that is, the object representing a particular state in the search space is not created until it is needed by the SearchManager. SearchManagers often need to be able to determine if a particular state has been entered before by comparing states. Because SearchStates may be generated dynamically, the SearchState.equals() call (as opposed to the reference equals '==' method) should be used to determine if states are equal. The states returned by the linguist will generally provide very efficient implementations of equals and hashCode. This will allow a SearchManager to maintain collections of states in HashMaps efficiently.
The lifecycle of a linguist is as follows:
- The linguist is created by the configuration manager
- The linguist is given an opportunity to register its properties via a call to its register method.
- The linguist is given a new set of properties via the newProperties call. A well written linguist should be prepared to respond to newProperties call at any time.
- The allocate method is called. During this call the linguist generally allocates resources such as acoustic and language models. This can often take a significant amount of time. A well-written linguist will be able to deal with multiple calls to allocate. This can happen if a linguist is shared by multiple search managers.
- The getSearchGraph method is called by the search to retrieve the search graph that is used to guide the decoding/search. This method is typically called at the beginning of each recognition. The linguist should endeavor to return the search graph as quickly as possible to reduce any recognition latency. Some linguists will pre-generate the search graph in the allocate method, and only need to return a reference to the search graph, while other linguists may dynamically generate the search graph on each call. Also note that some linguists may change the search graph between calls so a search manager should always get a new search graph before the start of each recognition.
- The startRecognition method is called just before recognition starts. This gives the linguist the opportunity to prepare for the recognition task. Some linguists may keep caches of search states that need to be primed or flushed. Note however that if a linguist depends on startRecognition or stopRecognition it is likely to not be a reentrant linguist which could limit its usefulness in some multi-threaded environments.
- The stopRecognition method is called just after recognition completes. This gives the linguist the opportunity to cleanup after the recognition task. Some linguists may keep caches of search states that need to be primed or flushed. Note however that if a linguist depends on startRecognition or stopRecognition it is likely to not be a reentrant linguist which could limit its usefulness in some multi-threaded environments.

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`PROP_FILLER_INSERTION_PROBABILITY` Filler insertion probability property
`static java.lang.String`	`PROP_LANGUAGE_WEIGHT` The property that defines the language weight for the search
`static java.lang.String`	`PROP_SILENCE_INSERTION_PROBABILITY` Silence insertion probability property
`static java.lang.String`	`PROP_UNIT_INSERTION_PROBABILITY` Unit insertion probability property
`static java.lang.String`	`PROP_WORD_INSERTION_PROBABILITY` Word insertion probability property

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`void`	`allocate()` Allocates the linguist.
`void`	`deallocate()` Deallocates the linguist.
`SearchGraph`	`getSearchGraph()` Retrieves search graph.
`void`	`startRecognition()` Called before a recognition.
`void`	`stopRecognition()` Called after a recognition.

Methods inherited from interface edu.cmu.sphinx.util.props.Configurable
newProperties

- Field Detail
  - PROP_WORD_INSERTION_PROBABILITY
```
@S4Double(defaultValue=1.0)
static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
```
    Word insertion probability property
    
    See Also:
    
    Constant Field Values
  - PROP_UNIT_INSERTION_PROBABILITY
```
@S4Double(defaultValue=1.0)
static final java.lang.String PROP_UNIT_INSERTION_PROBABILITY
```
    Unit insertion probability property
    
    See Also:
    
    Constant Field Values
  - PROP_SILENCE_INSERTION_PROBABILITY
```
@S4Double(defaultValue=1.0)
static final java.lang.String PROP_SILENCE_INSERTION_PROBABILITY
```
    Silence insertion probability property
    
    See Also:
    
    Constant Field Values
  - PROP_FILLER_INSERTION_PROBABILITY
```
@S4Double(defaultValue=1.0)
static final java.lang.String PROP_FILLER_INSERTION_PROBABILITY
```
    Filler insertion probability property
    
    See Also:
    
    Constant Field Values
  - PROP_LANGUAGE_WEIGHT
```
@S4Double(defaultValue=1.0)
static final java.lang.String PROP_LANGUAGE_WEIGHT
```
    The property that defines the language weight for the search
    
    See Also:
    
    Constant Field Values
- Method Detail
  - getSearchGraph
```
SearchGraph getSearchGraph()
```
    Retrieves search graph. The search graph represents the search space to be used to guide the search.
    Implementor's note: This method is typically called at the beginning of each recognition and therefore should be
    
    Returns:
    
    the search graph
  - startRecognition
```
void startRecognition()
```
    Called before a recognition. This method gives a linguist the opportunity to prepare itself before a recognition begins.
    Implementor's Note - Some linguists (or underlying lanaguge or acoustic models) may keep caches or pools that need to be initialzed before a recognition. A linguist may implement this method to perform such initialization. Note however, that an ideal linguist will, once allocated, be state-less. This will allow the linguist to be shared by multiple simulataneous searches. Reliance on a 'startRecognition' may prevent a linguist from being used in a multi-threaded search.
  - stopRecognition
```
void stopRecognition()
```
    Called after a recognition. This method gives a linguist the opportunity to clean up after a recognition has been completed.
    Implementor's Note - Some linguists (or underlying lanaguge or acoustic models) may keep caches or pools that need to be flushed after a recognition. A linguist may implement this method to perform such flushing. Note however, that an ideal linguist will once allocated, be state-less. This will allow the linguist to be shared by multiple simulataneous searches. Reliance on a 'stopRecognition' may prevent a linguist from being used in a multi-threaded search.
  - allocate
```
void allocate()
       throws java.io.IOException
```
    Allocates the linguist. Resources allocated by the linguist are allocated here. This method may take many seconds to complete depending upon the linguist.
    Implementor's Note - A well written linguist will allow allocate to be called multiple times without harm. This will allow a linguist to be shared by multiple search managers.
    
    Throws:
    
    java.io.IOException - if an IO error occurs
  - deallocate
```
void deallocate()
         throws java.io.IOException
```
    Deallocates the linguist. Any resources allocated by this linguist are released.
    Implementor's Note - if the linguist is being shared by multiple searches, the deallocate should only actually deallocate things when the last call to deallocate is made. Two approaches for dealing with this:
    (1) Keep an allocation counter that is incremented during allocate and decremented during deallocate. Only when the counter reaches zero should the actually deallocation be performed.
    (2) Do nothing in dellocate - just the the GC take care of things
    
    Throws:
    
    java.io.IOException - if an IO error occurs

Interface Linguist

Field Summary

Method Summary

Methods inherited from interface edu.cmu.sphinx.util.props.Configurable

Field Detail

PROP_WORD_INSERTION_PROBABILITY

PROP_UNIT_INSERTION_PROBABILITY

PROP_SILENCE_INSERTION_PROBABILITY

PROP_FILLER_INSERTION_PROBABILITY

PROP_LANGUAGE_WEIGHT

Method Detail

getSearchGraph

startRecognition

stopRecognition

allocate

deallocate