public class MelFrequencyFilterBank2 extends BaseDataProcessor
The triangular mel-filters in the filter bank are placed in the frequency axis so that each filter's center frequency follows the mel scale, in such a way that the filter bank mimics the critical band, which represents different perceptual effect at different frequency bands. Additionally, the edges are placed so that they coincide with the center frequencies in adjacent filters. Pictorially, the filter bank looks like:
As you might notice in the above figure, the distance at the base from the center to the left edge is different from the center to the right edge. Since the center frequencies follow the mel-frequency scale, which is a non-linear scale that models the non-linear human hearing behavior, the mel filter bank corresponds to a warping of the frequency axis. As can be inferred from the figure, filtering with the mel scale emphasizes the lower frequencies. A common model for the relation between frequencies in mel and linear scales is as follows:
melFrequency = 2595 * log(1 + linearFrequency/700)
The constants that define the filterbank are the number of filters, the minimum frequency, and the maximum frequency. The minimum and maximum frequencies determine the frequency range spanned by the filterbank. These frequencies depend on the channel and the sampling frequency that you are using. For telephone speech, since the telephone channel corresponds to a bandpass filter with cutoff frequencies of around 300Hz and 3700Hz, using limits wider than these would waste bandwidth. For clean speech, the minimum frequency should be higher than about 100Hz, since there is no speech information below it. Furthermore, by setting the minimum frequency above 50/60Hz, we get rid of the hum resulting from the AC power, if present.
The maximum frequency has to be lower than the Nyquist frequency, that is, half the sampling rate. Furthermore, there is not much information above 6800Hz that can be used for improving separation between models. Particularly for very noisy channels, maximum frequency of around 5000Hz may help cut off the noise.
Typical values for the constants defining the filter bank are:
Sample rate (Hz) | 16000 | 11025 | 8000 |
numberFilters |
40 | 36 | 31 |
minimumFrequency (Hz) |
130 | 130 | 200 |
maximumFrequency (Hz) |
6800 | 5400 | 3500 |
Davis and Mermelstein showed that Mel-frequency cepstral coefficients present robust characteristics that are good for speech recognition. For details, see Davis and Mermelstein, Comparison of Parametric Representations for Monosyllable Word Recognition in Continuously Spoken Sentences, IEEE Transactions on Acoustic, Speech and Signal Processing, 1980 .
MelFilter2
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
PROP_MAX_FREQ
The property for the maximum frequency covered by the filterbank.
|
static java.lang.String |
PROP_MIN_FREQ
The property for the minimum frequency covered by the filterbank.
|
static java.lang.String |
PROP_NUMBER_FILTERS
The property for the number of filters in the filterbank.
|
logger
Constructor and Description |
---|
MelFrequencyFilterBank2() |
MelFrequencyFilterBank2(double minFreq,
double maxFreq,
int numberFilters) |
Modifier and Type | Method and Description |
---|---|
Data |
getData()
Reads the next Data object, which is the power spectrum of an audio
input frame.
|
void |
initialize()
Initializes this DataProcessor.
|
void |
newProperties(PropertySheet ps)
This method is called when this configurable component needs to be reconfigured.
|
getPredecessor, setPredecessor
getName, initLogger, toString
@S4Integer(defaultValue=40) public static final java.lang.String PROP_NUMBER_FILTERS
@S4Double(defaultValue=130.0) public static final java.lang.String PROP_MIN_FREQ
@S4Double(defaultValue=6800.0) public static final java.lang.String PROP_MAX_FREQ
public MelFrequencyFilterBank2(double minFreq, double maxFreq, int numberFilters)
public MelFrequencyFilterBank2()
public void newProperties(PropertySheet ps) throws PropertyException
Configurable
newProperties
in interface Configurable
newProperties
in class ConfigurableAdapter
ps
- a property sheet holding the new dataPropertyException
- if there is a problem with the properties.public void initialize()
BaseDataProcessor
initialize
in interface DataProcessor
initialize
in class BaseDataProcessor
public Data getData() throws DataProcessingException
getData
in interface DataProcessor
getData
in class BaseDataProcessor
DataProcessingException
- if there is a data processing error