SpeechClassifier (sphinx4-core 5prealpha-SNAPSHOT API)

java.lang.Object
- edu.cmu.sphinx.util.props.ConfigurableAdapter
- - edu.cmu.sphinx.frontend.BaseDataProcessor
  - - edu.cmu.sphinx.frontend.endpoint.AbstractVoiceActivityDetector
    - - edu.cmu.sphinx.frontend.endpoint.SpeechClassifier

All Implemented Interfaces:

DataProcessor, Configurable
```
public class SpeechClassifier
extends AbstractVoiceActivityDetector
```
Implements a level tracking endpointer invented by Bent Schmidt Nielsen.
This endpointer is composed of two main steps.
1. classification of audio into speech and non-speech
2. inserting SPEECH_START and SPEECH_END signals around speech and removing non-speech regions
The first step, classification of audio into speech and non-speech, uses Bent Schmidt Nielsen's algorithm. Each time audio comes in, the average signal level and the background noise level are updated, using the signal level of the current audio. If the average signal level is greater than the background noise level by a certain threshold value (configurable), then the current audio is marked as speech. Otherwise, it is marked as non-speech.
The second step of this endpointer is documented in the class SpeechMarker
See Also:

SpeechMarker

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`adjustment`
`protected double`	`averageNumber`
`protected double`	`background` background signal level.
`protected long`	`backgroundFrames`
`protected float`	`frameLengthSec`
`protected boolean`	`isSpeech`
`protected double`	`level` average signal level.
`protected double`	`minSignal` minimum valid signal level.
`static java.lang.String`	`PROP_ADJUSTMENT` The property specifying the adjustment.
`static java.lang.String`	`PROP_FRAME_LENGTH_MS` The property specifying the endpointing frame length in milliseconds.
`static java.lang.String`	`PROP_MIN_SIGNAL` The property specifying the minimum signal level used to update the background signal level.
`static java.lang.String`	`PROP_THRESHOLD` The property specifying the threshold.
`protected long`	`speechFrames`
`protected double`	`threshold`
`protected double`	`totalBackgroundLevel`
`protected double`	`totalSpeechLevel`

Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
logger

Constructor Summary

Constructors
Constructor and Description
`SpeechClassifier()`
`SpeechClassifier(int frameLengthMs, double adjustment, double threshold, double minSignal)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected SpeechClassifiedData`	`classify(DoubleData audio)` Classifies the given audio frame as speech or not, and updates the endpointing parameters.
`Data`	`getData()` Returns the next Data object.
`boolean`	`getNoisy()` Return the estimation if input data was noisy enough to break recognition.
`double`	`getSNR()` Retrieves accumulated signal to noise ratio in dbScale
`void`	`initialize()` Initializes this LevelTracker endpointer and DataProcessor predecessor.
`boolean`	`isSpeech()` Method that returns if current returned frame contains speech.
`static double`	`logRootMeanSquare(double[] samples)` Returns the logarithm base 10 of the root mean square of the given samples.
`void`	`newProperties(PropertySheet ps)` This method is called when this configurable component needs to be reconfigured.
`protected void`	`reset()` Resets this LevelTracker to a starting state.

Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
getPredecessor, setPredecessor

Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
getName, initLogger, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - PROP_FRAME_LENGTH_MS
```
@S4Integer(defaultValue=10)
public static final java.lang.String PROP_FRAME_LENGTH_MS
```
    The property specifying the endpointing frame length in milliseconds.
    
    See Also:
    
    Constant Field Values
  - PROP_MIN_SIGNAL
```
@S4Double(defaultValue=0.0)
public static final java.lang.String PROP_MIN_SIGNAL
```
    The property specifying the minimum signal level used to update the background signal level.
    
    See Also:
    
    Constant Field Values
  - PROP_THRESHOLD
```
@S4Double(defaultValue=10.0)
public static final java.lang.String PROP_THRESHOLD
```
    The property specifying the threshold. If the current signal level is greater than the background level by this threshold, then the current signal is marked as speech. Therefore, a lower threshold will make the endpointer more sensitive, that is, mark more audio as speech. A higher threshold will make the endpointer less sensitive, that is, mark less audio as speech.
    
    See Also:
    
    Constant Field Values
  - PROP_ADJUSTMENT
```
@S4Double(defaultValue=0.003)
public static final java.lang.String PROP_ADJUSTMENT
```
    The property specifying the adjustment.
    
    See Also:
    
    Constant Field Values
  - averageNumber
```
protected final double averageNumber
```
    See Also:
    
    Constant Field Values
  - adjustment
```
protected double adjustment
```
  - level
```
protected double level
```
    average signal level.
  - background
```
protected double background
```
    background signal level.
  - minSignal
```
protected double minSignal
```
    minimum valid signal level.
  - threshold
```
protected double threshold
```
  - frameLengthSec
```
protected float frameLengthSec
```
  - isSpeech
```
protected boolean isSpeech
```
  - speechFrames
```
protected long speechFrames
```
  - backgroundFrames
```
protected long backgroundFrames
```
  - totalBackgroundLevel
```
protected double totalBackgroundLevel
```
  - totalSpeechLevel
```
protected double totalSpeechLevel
```
- Constructor Detail
  - SpeechClassifier
```
public SpeechClassifier(int frameLengthMs,
                        double adjustment,
                        double threshold,
                        double minSignal)
```
  - SpeechClassifier
```
public SpeechClassifier()
```
- Method Detail
  - newProperties
```
public void newProperties(PropertySheet ps)
                   throws PropertyException
```
    Description copied from interface: Configurable
    
    This method is called when this configurable component needs to be reconfigured.
    
    Specified by:
    
    newProperties in interface Configurable
    
    Overrides:
    
    newProperties in class ConfigurableAdapter
    
    Parameters:
    
    ps - a property sheet holding the new data
    
    Throws:
    
    PropertyException - if there is a problem with the properties.
  - initialize
```
public void initialize()
```
    Initializes this LevelTracker endpointer and DataProcessor predecessor.
    
    Specified by:
    
    initialize in interface DataProcessor
    
    Overrides:
    
    initialize in class BaseDataProcessor
  - reset
```
protected void reset()
```
    Resets this LevelTracker to a starting state.
  - logRootMeanSquare
```
public static double logRootMeanSquare(double[] samples)
```
    Returns the logarithm base 10 of the root mean square of the given samples.
    
    Parameters:
    
    samples - the samples
    
    Returns:
    
    the calculated log root mean square in log 10
  - classify
```
protected SpeechClassifiedData classify(DoubleData audio)
```
    Classifies the given audio frame as speech or not, and updates the endpointing parameters.
    
    Parameters:
    
    audio - the audio frame
    
    Returns:
    
    Data with classification flag
  - getData
```
public Data getData()
             throws DataProcessingException
```
    Returns the next Data object.
    
    Specified by:
    
    getData in interface DataProcessor
    
    Specified by:
    
    getData in class BaseDataProcessor
    
    Returns:
    
    the next Data object, or null if none available
    
    Throws:
    
    DataProcessingException - if a data processing error occurs
  - isSpeech
```
public boolean isSpeech()
```
    Method that returns if current returned frame contains speech. It could be used by noise filter for example to adjust noise spectrum estimation.
    
    Specified by:
    
    isSpeech in class AbstractVoiceActivityDetector
    
    Returns:
    
    if current frame is speech
  - getSNR
```
public double getSNR()
```
    Retrieves accumulated signal to noise ratio in dbScale
    
    Returns:
    
    signal to noise ratio
  - getNoisy
```
public boolean getNoisy()
```
    Return the estimation if input data was noisy enough to break recognition. The audio is counted noisy if signal to noise ratio is less then 20dB.
    
    Returns:
    
    estimation of data being noisy

Class SpeechClassifier

Field Summary

Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter

Constructor Summary

Method Summary

Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor

Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter

Methods inherited from class java.lang.Object

Field Detail

PROP_FRAME_LENGTH_MS

PROP_MIN_SIGNAL

PROP_THRESHOLD

PROP_ADJUSTMENT

averageNumber

adjustment

level

background

minSignal

threshold

frameLengthSec

isSpeech

speechFrames

backgroundFrames

totalBackgroundLevel

totalSpeechLevel

Constructor Detail

SpeechClassifier

SpeechClassifier

Method Detail

newProperties

initialize

reset

logRootMeanSquare

classify

getData

isSpeech

getSNR

getNoisy