ConcatFileDataSource (sphinx4-core 5prealpha-SNAPSHOT API)

java.lang.Object
- edu.cmu.sphinx.util.props.ConfigurableAdapter
- - edu.cmu.sphinx.frontend.BaseDataProcessor
  - - edu.cmu.sphinx.frontend.util.StreamDataSource
    - - edu.cmu.sphinx.frontend.util.ConcatFileDataSource

All Implemented Interfaces:: DataProcessor, Configurable, ReferenceSource

public class ConcatFileDataSource
extends StreamDataSource
implements ReferenceSource

Concatenates a list raw headerless audio files as one continuous audio stream. A DataStartSignal will be placed before the start of the first file, and a DataEndSignal after the last file. No DataStartSignal or DataEndSignal will be placed between them. Optionally, silence can be added in-between the audio files by setting the property:

edu.cmu.sphinx.frontend.util.ConcatFileDataSource.silenceFile

to a audio file for silence. By default, no silence is added. Moreover, one can also specify how many files to skip for every file read.

You can also specify the name of a transcript file to write the transcription to. The transcription will be written in HUB-4 style. A sample HUB-4 transcript looks like:

 bn99en_1 1 peter_jennings 0.806084 7.079850 <o,f4,male> Tonight this
 Thursday big pressure on the Clinton administration to do something about
 the latest killing in Yugoslavia
 bn99en_1 1 peter_jennings 7.079850 14.007608 <o,fx,male> Airline passengers
 and outrageous behavior at thirty thousand feet What can an airline do
 ...
 bn99en_1 1 inter_segment_gap 23.097000 28.647000 <o,fx,>
 ...

The format of each line is:

 test_set_name category speaker_name start_time_in_seconds
 end_time_in_seconds <category,hub4_focus_conditions,speaker_sex> transcript

In our example above,

 test_set_name is "bn99en_1"
 category is "1"
 speaker_name is "peter_jennings"
 start_time_in_seconds is "0.806084"
 end_time_in_seconds is "7.079850"
 category is "o" for "Overall"
 hub4_focus_conditions is:
     "f0" for "Baseline//Broadcast//Speech"
     "f1" for "Spontaneous//Broadcast//Speech"
     "f2" for "Speech Over//Telephone//Channels"
     "f3" for "Speech in the//Presence of//Background Music"
     "f4" for "Speech Under//Degraded//Acoustic Conditions"
     "f5" for "Speech from//Non-Native//Speakers"
     "fx" for "All other speech"
 speaker_sex is "male"
 transcript is "Tonight this Thursday big pressure on the Clinton
 administration to do something about the latest killing in Yugoslavia

The ConcatFileDataSource will produce such a transcript if the name of the file to write to is supplied in the constructor. This transcript file will be used in detected gap insertion errors, because it accurately describes the "correct" sequence of speech and silences in the concatenated version of the audio files.

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`PROP_ADD_RANDOM_SILENCE` The property that specifies whether to add random silence.
`static java.lang.String`	`PROP_BATCH_FILE` The property for the file containing a list of audio files to read from.
`static java.lang.String`	`PROP_MAX_SILENCE` The property that specifies the maximum number of times the silence file is added between files.
`static java.lang.String`	`PROP_SILENCE_FILE` The property that specifies the silence audio file, if any.
`static java.lang.String`	`PROP_SKIP` The property that specifies the number of files to skip for every file read.
`static java.lang.String`	`PROP_START_FILE` The property that specifies which file to start at.
`static java.lang.String`	`PROP_TOTAL_FILES` The property that specifies the total number of files to read.
`static java.lang.String`	`PROP_TRANSCRIPT_FILE` The property that specifies the name of the transcript file.

Fields inherited from class edu.cmu.sphinx.frontend.util.StreamDataSource
bitsPerSample, PROP_BIG_ENDIAN_DATA, PROP_BITS_PER_SAMPLE, PROP_BYTES_PER_READ, PROP_SAMPLE_RATE, PROP_SIGNED_DATA, sampleRate

Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
logger

Constructor Summary

Constructors
Constructor and Description
`ConcatFileDataSource()`
`ConcatFileDataSource(int sampleRate, int bytesPerRead, int bitsPerSample, boolean bigEndian, boolean signedData, boolean addRandomSilence, int maxSilence, int skip, java.lang.String silenceFileName, int startFile, int totalFiles, java.lang.String transcriptFile, java.lang.String batchFile)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.util.List<java.lang.String>`	`getReferences()` Returns a list of all reference text.
`java.lang.String`	`getTranscriptFile()` Returns the name of the transcript file.
`void`	`initialize()` Initializes a ConcatFileDataSource.
`void`	`newProperties(PropertySheet ps)` This method is called when this configurable component needs to be reconfigured.

Methods inherited from class edu.cmu.sphinx.frontend.util.StreamDataSource
getData, setInputStream, setInputStream

Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
getPredecessor, setPredecessor

Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter
getName, initLogger, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - PROP_START_FILE
```
@S4Integer(defaultValue=1)
public static final java.lang.String PROP_START_FILE
```
    The property that specifies which file to start at.
    
    See Also:
    
    Constant Field Values
  - PROP_SKIP
```
@S4Integer(defaultValue=0)
public static final java.lang.String PROP_SKIP
```
    The property that specifies the number of files to skip for every file read.
    
    See Also:
    
    Constant Field Values
  - PROP_TOTAL_FILES
```
@S4Integer(defaultValue=-1)
public static final java.lang.String PROP_TOTAL_FILES
```
    The property that specifies the total number of files to read. The default value should be no limit.
    
    See Also:
    
    Constant Field Values
  - PROP_SILENCE_FILE
```
@S4String
public static final java.lang.String PROP_SILENCE_FILE
```
    The property that specifies the silence audio file, if any. If this property is null, then no silences are added in between files.
    
    See Also:
    
    Constant Field Values
  - PROP_ADD_RANDOM_SILENCE
```
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_ADD_RANDOM_SILENCE
```
    The property that specifies whether to add random silence.
    
    See Also:
    
    Constant Field Values
  - PROP_MAX_SILENCE
```
@S4Integer(defaultValue=3)
public static final java.lang.String PROP_MAX_SILENCE
```
    The property that specifies the maximum number of times the silence file is added between files. If PROP_ADD_RANDOM_SILENCE is set to true, the number of times the silence file is added is between 1 and this value. If PROP_ADD_RANDOM_SILENCE is set to false, this value will be the number of times the silence file is added. So if PROP_MAX_SILENCE is set to 3, then the silence file will be added three times between files.
    
    See Also:
    
    Constant Field Values
  - PROP_TRANSCRIPT_FILE
```
@S4String
public static final java.lang.String PROP_TRANSCRIPT_FILE
```
    The property that specifies the name of the transcript file. If this property is set, a transcript file will be created. No transcript file will be created if this property is not set.
    
    See Also:
    
    Constant Field Values
  - PROP_BATCH_FILE
```
@S4String
public static final java.lang.String PROP_BATCH_FILE
```
    The property for the file containing a list of audio files to read from.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - ConcatFileDataSource
```
public ConcatFileDataSource(int sampleRate,
                            int bytesPerRead,
                            int bitsPerSample,
                            boolean bigEndian,
                            boolean signedData,
                            boolean addRandomSilence,
                            int maxSilence,
                            int skip,
                            java.lang.String silenceFileName,
                            int startFile,
                            int totalFiles,
                            java.lang.String transcriptFile,
                            java.lang.String batchFile)
```
  - ConcatFileDataSource
```
public ConcatFileDataSource()
```
- Method Detail
  - newProperties
```
public void newProperties(PropertySheet ps)
                   throws PropertyException
```
    Description copied from interface: Configurable
    
    This method is called when this configurable component needs to be reconfigured.
    
    Specified by:
    
    newProperties in interface Configurable
    
    Overrides:
    
    newProperties in class StreamDataSource
    
    Parameters:
    
    ps - a property sheet holding the new data
    
    Throws:
    
    PropertyException - if there is a problem with the properties.
  - initialize
```
public void initialize()
```
    Initializes a ConcatFileDataSource.
    
    Specified by:
    
    initialize in interface DataProcessor
    
    Overrides:
    
    initialize in class StreamDataSource
  - getReferences
```
public java.util.List<java.lang.String> getReferences()
```
    Returns a list of all reference text. Implements the getReferences() method of ReferenceSource.
    
    Specified by:
    
    getReferences in interface ReferenceSource
    
    Returns:
    
    a list of all reference text
  - getTranscriptFile
```
public java.lang.String getTranscriptFile()
```
    Returns the name of the transcript file.
    
    Returns:
    
    the name of the transcript file

Class ConcatFileDataSource

Field Summary

Fields inherited from class edu.cmu.sphinx.frontend.util.StreamDataSource

Fields inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter

Constructor Summary

Method Summary

Methods inherited from class edu.cmu.sphinx.frontend.util.StreamDataSource

Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor

Methods inherited from class edu.cmu.sphinx.util.props.ConfigurableAdapter

Methods inherited from class java.lang.Object

Field Detail

PROP_START_FILE

PROP_SKIP

PROP_TOTAL_FILES

PROP_SILENCE_FILE

PROP_ADD_RANDOM_SILENCE

PROP_MAX_SILENCE

PROP_TRANSCRIPT_FILE

PROP_BATCH_FILE

Constructor Detail

ConcatFileDataSource

ConcatFileDataSource

Method Detail

newProperties

initialize

getReferences

getTranscriptFile