LargeNGramModel (sphinx4-core 5prealpha-SNAPSHOT API)

java.lang.Object
- edu.cmu.sphinx.linguist.language.ngram.large.LargeNGramModel

All Implemented Interfaces:

LanguageModel, Configurable

Direct Known Subclasses:

KeywordOptimizerLargeNGramModel, LargeTrigramModel
```
public class LargeNGramModel
extends java.lang.Object
implements LanguageModel
```
Language model that uses a binary NGram language model file ("DMP file") generated by the SphinxBase sphinx_lm_convert.

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`applyLanguageWeightAndWip`
`static int`	`BYTES_PER_NGRAM` The number of bytes per N-gram in the LM file generated by the CMU-Cambridge Statistical Language Modeling Toolkit.
`static int`	`BYTES_PER_NMAXGRAM`
`protected boolean`	`clearCacheAfterUtterance`
`protected Dictionary`	`dictionary`
`protected java.lang.String`	`format`
`protected boolean`	`fullSmear`
`protected float`	`languageWeight`
`protected java.util.logging.Logger`	`logger`
`protected LogMath`	`logMath`
`protected int`	`maxDepth`
`protected int`	`ngramCacheSize`
`protected java.lang.String`	`ngramLogFile`
`static java.lang.String`	`PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP` The property that controls whether or not the language model will apply the language weight and word insertion probability
`static java.lang.String`	`PROP_CLEAR_CACHES_AFTER_UTTERANCE` The property that controls whether the ngram caches are cleared after every utterance
`static java.lang.String`	`PROP_FULL_SMEAR` If true, use full bigram information to determine smear
`static java.lang.String`	`PROP_LANGUAGE_WEIGHT` The property that defines the language weight for the search
`static java.lang.String`	`PROP_NGRAM_CACHE_SIZE` The property that defines that maximum number of ngrams to be cached
`static java.lang.String`	`PROP_QUERY_LOG_FILE` The property for the name of the file that logs all the queried N-grams.
`static java.lang.String`	`PROP_WORD_INSERTION_PROBABILITY` Word insertion probability property
`protected float`	`unigramWeight`
`protected double`	`wip`

Fields inherited from interface edu.cmu.sphinx.linguist.language.ngram.LanguageModel
PROP_DICTIONARY, PROP_LOCATION, PROP_MAX_DEPTH, PROP_UNIGRAM_WEIGHT

Constructor Summary

Constructors
Constructor and Description
`LargeNGramModel()`
`LargeNGramModel(java.lang.String format, java.net.URL location, java.lang.String ngramLogFile, int maxNGramCacheSize, boolean clearCacheAfterUtterance, int maxDepth, Dictionary dictionary, boolean applyLanguageWeightAndWip, float languageWeight, double wip, float unigramWeight, boolean fullSmear)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`allocate()` Create the language model
`void`	`deallocate()` Deallocate resources allocated to this language model
`int`	`getMaxDepth()` Returns the maximum depth of the language model
`int`	`getNGramHits()` Returns the number of NGram hits.
`int`	`getNGramMisses()` Returns the number of times when a NGram is queried, but there is no such NGram in the LM (in which case it uses the backoff probabilities).
`float`	`getProbability(WordSequence wordSequence)` Gets the ngram probability of the word sequence represented by the word list
`float`	`getSmear(WordSequence wordSequence)` Gets the smear term for the given wordSequence.
`float`	`getSmearOld(WordSequence wordSequence)` Gets the smear term for the given wordSequence
`java.util.Set<java.lang.String>`	`getVocabulary()` Returns the set of words in the language model.
`int`	`getWordID(Word word)` Returns the ID of the given word.
`boolean`	`hasWord(Word w)` Returns true if the language model contains the given word
`void`	`newProperties(PropertySheet ps)` This method is called when this configurable component needs to be reconfigured.
`void`	`onUtteranceEnd()` Called on utterance end to clear cache if needed

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - PROP_QUERY_LOG_FILE
```
@S4String(mandatory=false)
public static final java.lang.String PROP_QUERY_LOG_FILE
```
    The property for the name of the file that logs all the queried N-grams. If this property is set to null, it means that the queried N-grams are not logged.
    
    See Also:
    
    Constant Field Values
  - PROP_NGRAM_CACHE_SIZE
```
@S4Integer(defaultValue=100000)
public static final java.lang.String PROP_NGRAM_CACHE_SIZE
```
    The property that defines that maximum number of ngrams to be cached
    
    See Also:
    
    Constant Field Values
  - PROP_CLEAR_CACHES_AFTER_UTTERANCE
```
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
```
    The property that controls whether the ngram caches are cleared after every utterance
    
    See Also:
    
    Constant Field Values
  - PROP_LANGUAGE_WEIGHT
```
@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_LANGUAGE_WEIGHT
```
    The property that defines the language weight for the search
    
    See Also:
    
    Constant Field Values
  - PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
```
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
```
    The property that controls whether or not the language model will apply the language weight and word insertion probability
    
    See Also:
    
    Constant Field Values
  - PROP_WORD_INSERTION_PROBABILITY
```
@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
```
    Word insertion probability property
    
    See Also:
    
    Constant Field Values
  - PROP_FULL_SMEAR
```
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_FULL_SMEAR
```
    If true, use full bigram information to determine smear
    
    See Also:
    
    Constant Field Values
  - BYTES_PER_NGRAM
```
public static final int BYTES_PER_NGRAM
```
    The number of bytes per N-gram in the LM file generated by the CMU-Cambridge Statistical Language Modeling Toolkit.
    
    See Also:
    
    Constant Field Values
  - BYTES_PER_NMAXGRAM
```
public static final int BYTES_PER_NMAXGRAM
```
    See Also:
    
    Constant Field Values
  - logger
```
protected java.util.logging.Logger logger
```
  - logMath
```
protected LogMath logMath
```
  - maxDepth
```
protected int maxDepth
```
  - ngramCacheSize
```
protected int ngramCacheSize
```
  - clearCacheAfterUtterance
```
protected boolean clearCacheAfterUtterance
```
  - fullSmear
```
protected boolean fullSmear
```
  - dictionary
```
protected Dictionary dictionary
```
  - format
```
protected java.lang.String format
```
  - applyLanguageWeightAndWip
```
protected boolean applyLanguageWeightAndWip
```
  - languageWeight
```
protected float languageWeight
```
  - unigramWeight
```
protected float unigramWeight
```
  - wip
```
protected double wip
```
  - ngramLogFile
```
protected java.lang.String ngramLogFile
```
- Constructor Detail
  - LargeNGramModel
```
public LargeNGramModel(java.lang.String format,
                       java.net.URL location,
                       java.lang.String ngramLogFile,
                       int maxNGramCacheSize,
                       boolean clearCacheAfterUtterance,
                       int maxDepth,
                       Dictionary dictionary,
                       boolean applyLanguageWeightAndWip,
                       float languageWeight,
                       double wip,
                       float unigramWeight,
                       boolean fullSmear)
```
  - LargeNGramModel
```
public LargeNGramModel()
```
- Method Detail
  - newProperties
```
public void newProperties(PropertySheet ps)
                   throws PropertyException
```
    Description copied from interface: Configurable
    
    This method is called when this configurable component needs to be reconfigured.
    
    Specified by:
    
    newProperties in interface Configurable
    
    Parameters:
    
    ps - a property sheet holding the new data
    
    Throws:
    
    PropertyException - if there is a problem with the properties.
  - allocate
```
public void allocate()
              throws java.io.IOException
```
    Description copied from interface: LanguageModel
    
    Create the language model
    
    Specified by:
    
    allocate in interface LanguageModel
    
    Throws:
    
    java.io.IOException - if error occurrs
  - deallocate
```
public void deallocate()
                throws java.io.IOException
```
    Description copied from interface: LanguageModel
    
    Deallocate resources allocated to this language model
    
    Specified by:
    
    deallocate in interface LanguageModel
    
    Throws:
    
    java.io.IOException - if error occurrs
  - onUtteranceEnd
```
public void onUtteranceEnd()
```
    Description copied from interface: LanguageModel
    
    Called on utterance end to clear cache if needed
    
    Specified by:
    
    onUtteranceEnd in interface LanguageModel
  - getProbability
```
public float getProbability(WordSequence wordSequence)
```
    Gets the ngram probability of the word sequence represented by the word list
    
    Specified by:
    
    getProbability in interface LanguageModel
    
    Parameters:
    
    wordSequence - the word sequence
    
    Returns:
    
    the probability of the word sequence. Probability is in logMath log base
  - getWordID
```
public final int getWordID(Word word)
```
    Returns the ID of the given word.
    
    Parameters:
    
    word - the word to find the ID
    
    Returns:
    
    the ID of the word
  - hasWord
```
public boolean hasWord(Word w)
```
    Returns true if the language model contains the given word
    
    Parameters:
    
    w - word
    
    Returns:
    
    if word is in the language model
  - getSmearOld
```
public float getSmearOld(WordSequence wordSequence)
```
    Gets the smear term for the given wordSequence
    
    Parameters:
    
    wordSequence - the word sequence
    
    Returns:
    
    the smear term associated with this word sequence
  - getSmear
```
public float getSmear(WordSequence wordSequence)
```
    Description copied from interface: LanguageModel
    
    Gets the smear term for the given wordSequence. Used in LexTreeLinguist. See LexTreeLinguist.PROP_WANT_UNIGRAM_SMEAR for details.
    
    Specified by:
    
    getSmear in interface LanguageModel
    
    Parameters:
    
    wordSequence - the word sequence
    
    Returns:
    
    the smear term associated with this word sequence
  - getMaxDepth
```
public int getMaxDepth()
```
    Returns the maximum depth of the language model
    
    Specified by:
    
    getMaxDepth in interface LanguageModel
    
    Returns:
    
    the maximum depth of the language model
  - getVocabulary
```
public java.util.Set<java.lang.String> getVocabulary()
```
    Returns the set of words in the language model. The set is unmodifiable.
    
    Specified by:
    
    getVocabulary in interface LanguageModel
    
    Returns:
    
    the unmodifiable set of words
  - getNGramMisses
```
public int getNGramMisses()
```
    Returns the number of times when a NGram is queried, but there is no such NGram in the LM (in which case it uses the backoff probabilities).
    
    Returns:
    
    the number of NGram misses
  - getNGramHits
```
public int getNGramHits()
```
    Returns the number of NGram hits.
    
    Returns:
    
    the number of NGram hits

Class LargeNGramModel

Field Summary

Fields inherited from interface edu.cmu.sphinx.linguist.language.ngram.LanguageModel

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

PROP_QUERY_LOG_FILE

PROP_NGRAM_CACHE_SIZE

PROP_CLEAR_CACHES_AFTER_UTTERANCE

PROP_LANGUAGE_WEIGHT

PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP

PROP_WORD_INSERTION_PROBABILITY

PROP_FULL_SMEAR

BYTES_PER_NGRAM

BYTES_PER_NMAXGRAM

logger

logMath

maxDepth

ngramCacheSize

clearCacheAfterUtterance

fullSmear

dictionary

format

applyLanguageWeightAndWip

languageWeight

unigramWeight

wip

ngramLogFile

Constructor Detail

LargeNGramModel

LargeNGramModel

Method Detail

newProperties

allocate

deallocate

onUtteranceEnd

getProbability

getWordID

hasWord

getSmearOld

getSmear

getMaxDepth

getVocabulary

getNGramMisses

getNGramHits