public class SpeechClassifier extends AbstractVoiceActivityDetector
This endpointer is composed of two main steps.
The first step, classification of audio into speech and non-speech, uses Bent Schmidt Nielsen's algorithm. Each time audio comes in, the average signal level and the background noise level are updated, using the signal level of the current audio. If the average signal level is greater than the background noise level by a certain threshold value (configurable), then the current audio is marked as speech. Otherwise, it is marked as non-speech.
The second step of this endpointer is documented in the class SpeechMarker
SpeechMarker
Modifier and Type | Field and Description |
---|---|
protected double |
adjustment |
protected double |
averageNumber |
protected double |
background
background signal level.
|
protected long |
backgroundFrames |
protected float |
frameLengthSec |
protected boolean |
isSpeech |
protected double |
level
average signal level.
|
protected double |
minSignal
minimum valid signal level.
|
static java.lang.String |
PROP_ADJUSTMENT
The property specifying the adjustment.
|
static java.lang.String |
PROP_FRAME_LENGTH_MS
The property specifying the endpointing frame length in milliseconds.
|
static java.lang.String |
PROP_MIN_SIGNAL
The property specifying the minimum signal level used to update the background signal level.
|
static java.lang.String |
PROP_THRESHOLD
The property specifying the threshold.
|
protected long |
speechFrames |
protected double |
threshold |
protected double |
totalBackgroundLevel |
protected double |
totalSpeechLevel |
logger
Constructor and Description |
---|
SpeechClassifier() |
SpeechClassifier(int frameLengthMs,
double adjustment,
double threshold,
double minSignal) |
Modifier and Type | Method and Description |
---|---|
protected SpeechClassifiedData |
classify(DoubleData audio)
Classifies the given audio frame as speech or not, and updates the endpointing parameters.
|
Data |
getData()
Returns the next Data object.
|
boolean |
getNoisy()
Return the estimation if input data was noisy enough to break
recognition.
|
double |
getSNR()
Retrieves accumulated signal to noise ratio in dbScale
|
void |
initialize()
Initializes this LevelTracker endpointer and DataProcessor predecessor.
|
boolean |
isSpeech()
Method that returns if current returned frame contains speech.
|
static double |
logRootMeanSquare(double[] samples)
Returns the logarithm base 10 of the root mean square of the given samples.
|
void |
newProperties(PropertySheet ps)
This method is called when this configurable component needs to be reconfigured.
|
protected void |
reset()
Resets this LevelTracker to a starting state.
|
getPredecessor, setPredecessor
getName, initLogger, toString
@S4Integer(defaultValue=10) public static final java.lang.String PROP_FRAME_LENGTH_MS
@S4Double(defaultValue=0.0) public static final java.lang.String PROP_MIN_SIGNAL
@S4Double(defaultValue=10.0) public static final java.lang.String PROP_THRESHOLD
@S4Double(defaultValue=0.003) public static final java.lang.String PROP_ADJUSTMENT
protected final double averageNumber
protected double adjustment
protected double level
protected double background
protected double minSignal
protected double threshold
protected float frameLengthSec
protected boolean isSpeech
protected long speechFrames
protected long backgroundFrames
protected double totalBackgroundLevel
protected double totalSpeechLevel
public SpeechClassifier(int frameLengthMs, double adjustment, double threshold, double minSignal)
public SpeechClassifier()
public void newProperties(PropertySheet ps) throws PropertyException
Configurable
newProperties
in interface Configurable
newProperties
in class ConfigurableAdapter
ps
- a property sheet holding the new dataPropertyException
- if there is a problem with the properties.public void initialize()
initialize
in interface DataProcessor
initialize
in class BaseDataProcessor
protected void reset()
public static double logRootMeanSquare(double[] samples)
samples
- the samplesprotected SpeechClassifiedData classify(DoubleData audio)
audio
- the audio framepublic Data getData() throws DataProcessingException
getData
in interface DataProcessor
getData
in class BaseDataProcessor
DataProcessingException
- if a data processing error occurspublic boolean isSpeech()
isSpeech
in class AbstractVoiceActivityDetector
public double getSNR()
public boolean getNoisy()