public class LargeNGramModel extends java.lang.Object implements LanguageModel
Modifier and Type | Field and Description |
---|---|
protected boolean |
applyLanguageWeightAndWip |
static int |
BYTES_PER_NGRAM
The number of bytes per N-gram in the LM file generated by the
CMU-Cambridge Statistical Language Modeling Toolkit.
|
static int |
BYTES_PER_NMAXGRAM |
protected boolean |
clearCacheAfterUtterance |
protected Dictionary |
dictionary |
protected java.lang.String |
format |
protected boolean |
fullSmear |
protected float |
languageWeight |
protected java.util.logging.Logger |
logger |
protected LogMath |
logMath |
protected int |
maxDepth |
protected int |
ngramCacheSize |
protected java.lang.String |
ngramLogFile |
static java.lang.String |
PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
The property that controls whether or not the language model will apply
the language weight and word insertion probability
|
static java.lang.String |
PROP_CLEAR_CACHES_AFTER_UTTERANCE
The property that controls whether the ngram caches are cleared after
every utterance
|
static java.lang.String |
PROP_FULL_SMEAR
If true, use full bigram information to determine smear
|
static java.lang.String |
PROP_LANGUAGE_WEIGHT
The property that defines the language weight for the search
|
static java.lang.String |
PROP_NGRAM_CACHE_SIZE
The property that defines that maximum number of ngrams to be cached
|
static java.lang.String |
PROP_QUERY_LOG_FILE
The property for the name of the file that logs all the queried N-grams.
|
static java.lang.String |
PROP_WORD_INSERTION_PROBABILITY
Word insertion probability property
|
protected float |
unigramWeight |
protected double |
wip |
PROP_DICTIONARY, PROP_LOCATION, PROP_MAX_DEPTH, PROP_UNIGRAM_WEIGHT
Constructor and Description |
---|
LargeNGramModel() |
LargeNGramModel(java.lang.String format,
java.net.URL location,
java.lang.String ngramLogFile,
int maxNGramCacheSize,
boolean clearCacheAfterUtterance,
int maxDepth,
Dictionary dictionary,
boolean applyLanguageWeightAndWip,
float languageWeight,
double wip,
float unigramWeight,
boolean fullSmear) |
Modifier and Type | Method and Description |
---|---|
void |
allocate()
Create the language model
|
void |
deallocate()
Deallocate resources allocated to this language model
|
int |
getMaxDepth()
Returns the maximum depth of the language model
|
int |
getNGramHits()
Returns the number of NGram hits.
|
int |
getNGramMisses()
Returns the number of times when a NGram is queried, but there is no such
NGram in the LM (in which case it uses the backoff probabilities).
|
float |
getProbability(WordSequence wordSequence)
Gets the ngram probability of the word sequence represented by the word
list
|
float |
getSmear(WordSequence wordSequence)
Gets the smear term for the given wordSequence.
|
float |
getSmearOld(WordSequence wordSequence)
Gets the smear term for the given wordSequence
|
java.util.Set<java.lang.String> |
getVocabulary()
Returns the set of words in the language model.
|
int |
getWordID(Word word)
Returns the ID of the given word.
|
boolean |
hasWord(Word w)
Returns true if the language model contains the given word
|
void |
newProperties(PropertySheet ps)
This method is called when this configurable component needs to be reconfigured.
|
void |
onUtteranceEnd()
Called on utterance end to clear cache if needed
|
@S4String(mandatory=false) public static final java.lang.String PROP_QUERY_LOG_FILE
@S4Integer(defaultValue=100000) public static final java.lang.String PROP_NGRAM_CACHE_SIZE
@S4Boolean(defaultValue=false) public static final java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
@S4Double(defaultValue=1.0) public static final java.lang.String PROP_LANGUAGE_WEIGHT
@S4Boolean(defaultValue=false) public static final java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
@S4Double(defaultValue=1.0) public static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
@S4Boolean(defaultValue=false) public static final java.lang.String PROP_FULL_SMEAR
public static final int BYTES_PER_NGRAM
public static final int BYTES_PER_NMAXGRAM
protected java.util.logging.Logger logger
protected LogMath logMath
protected int maxDepth
protected int ngramCacheSize
protected boolean clearCacheAfterUtterance
protected boolean fullSmear
protected Dictionary dictionary
protected java.lang.String format
protected boolean applyLanguageWeightAndWip
protected float languageWeight
protected float unigramWeight
protected double wip
protected java.lang.String ngramLogFile
public LargeNGramModel(java.lang.String format, java.net.URL location, java.lang.String ngramLogFile, int maxNGramCacheSize, boolean clearCacheAfterUtterance, int maxDepth, Dictionary dictionary, boolean applyLanguageWeightAndWip, float languageWeight, double wip, float unigramWeight, boolean fullSmear)
public LargeNGramModel()
public void newProperties(PropertySheet ps) throws PropertyException
Configurable
newProperties
in interface Configurable
ps
- a property sheet holding the new dataPropertyException
- if there is a problem with the properties.public void allocate() throws java.io.IOException
LanguageModel
allocate
in interface LanguageModel
java.io.IOException
- if error occurrspublic void deallocate() throws java.io.IOException
LanguageModel
deallocate
in interface LanguageModel
java.io.IOException
- if error occurrspublic void onUtteranceEnd()
LanguageModel
onUtteranceEnd
in interface LanguageModel
public float getProbability(WordSequence wordSequence)
getProbability
in interface LanguageModel
wordSequence
- the word sequencepublic final int getWordID(Word word)
word
- the word to find the IDpublic boolean hasWord(Word w)
w
- wordpublic float getSmearOld(WordSequence wordSequence)
wordSequence
- the word sequencepublic float getSmear(WordSequence wordSequence)
LanguageModel
LexTreeLinguist
. See
LexTreeLinguist.PROP_WANT_UNIGRAM_SMEAR
for details.getSmear
in interface LanguageModel
wordSequence
- the word sequencepublic int getMaxDepth()
getMaxDepth
in interface LanguageModel
public java.util.Set<java.lang.String> getVocabulary()
getVocabulary
in interface LanguageModel
public int getNGramMisses()
public int getNGramHits()