Research Using CMUSphinx
CMU Sphinx Toolkit is actively used in speech recognition research. To note some, here is the list of publications it’s worth to mention
Ph.D Theses
Ziad Al Bawab, An Analysis-by-Synthesis Approach to Vocal Tract Modeling for Robust Speech Recognition, Ph.D. Thesis, ECE Department, CMU, September, 2009.
Xiang Li, Combination and Generation of Parallel Feature Streams for Improved Speech Recognition , Ph.D. Thesis, ECE Department, CMU, February 2005.
Jon P. Nedel, Duration Normalization for Robust Recognition of Spontaneous Speech via Missing Feature Methods , Ph.D. Thesis, ECE Department, CMU, April, 2004.
Michael L. Seltzer, Microphone Array Processing for Robust Speech Recognition , Ph.D. Thesis, ECE Department, CMU, July 2003.
Sam-Joo Doh, Enhancements to Transformation-Based Speaker Adaptation: Principal Component and Inter-Class Maximum Likelihood Linear Regression , Ph.D. Thesis, ECE Department, CMU, July 2000.
Juan M. Huerta, Robust Speech Recognition in GSM Codec Environments , Ph.D. Thesis, ECE Department, CMU, April 2000.
Bhiksha Raj, Reconstruction of Incomplete Spectrograms for Robust Speech Recognition (.pdf 1.3MB) , Ph.D. Thesis, ECE Department, CMU, April 2000.
Matthew A. Siegler, Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance , Ph.D. Thesis, ECE Department, CMU, December 1999.
Evandro B. Gouvea, Acoustic-Feature-Based Frequency Warping for Speaker Normalization , Ph.D. Thesis, ECE Department, CMU, February 1999.
Thomas M. Sullivan, Multi-Microphone Correlation-Based Processing for Robust Automatic Speech Recognition (2.2MB), (PDF format) Ph.D. Thesis, ECE Department, CMU, August 1996. (Compressed, 0.7MB) (Abstract)
Pedro J. Moreno, Speech Recognition in Noisy Environments (1.3MB), (PDF format ) Ph.D. Thesis, ECE Department, CMU, May 1996. (Compressed, 0.5MB) (Abstract)
Fu-Hua Liu, Environmental Adaptation for Robust Speech Recognition (2.3MB), Ph.D. Thesis, ECE Department, CMU, June 1994. (abstract)
Yoshiaki Ohshima, Environmental Robustness in Speech Recognition using Physiologically-Motivated Signal Processing, Ph.D. Thesis, ECE Department, CMU, December 1993. (abstract)
William A. Rozzi, Speaker Adaptation in Automatic Speech Recognition via Estimation of Correlated Mean Vectors (2MB), Ph.D. Thesis, ECE Department, CMU, May 1991. (Compressed, 0.6MB) (abstract)
Alejandro Acero, Acoustical and Environmental Robustness for Automatic Speech Recognition (.pdf, 1.3MB), Ph.D. Thesis, ECE Department, CMU, September 1990. (abstract)
MS Reports
Balakrishnan Narayanaswamy, Improved Text-Independent Speaker Recognition using Gaussian Mixture Probabilities , Master’s Report, ECE Department, CMU, May 2005.
Michael Seltzer, Automatic Detection of Corrupted Speech Features for Robust Speech Recognition , ECE Department, CMU, May 2000.
Jon Nedel, Integration of Speech and Video: Applications for Lip Synch: Lip Movement Synthesis and Time Warping , Master’s Report, ECE Department, CMU, May 1999.
Uday Jain, Connected Digit Recognition over Long Distance Telephone Lines Using the SPHINX-II System, Master’s Report, ECE Department, CMU, May 1995. (abstract)
Matthew Siegler, Effects of Speech Rate on Speech Recognition Accuracy, Master’s Report, ECE Department, CMU, December 1995. (compress ps file)
(abstract) -
Pedro J. Moreno, Speech Recognition in Telephone Environments,
Master’s Report, ECE Department, CMU, January 1993. (abstract)
Papers and Talks
H.-M. Park and R. M. Stern, “Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero crossings,” Speech Communication, January, 2009.
Y.-H. B. Chiu and R. M. Stern, “ Minimum variance modulation filters for robust speech recognition ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2009, Taipei, Taiwan.
Z. Al Bawab, L. Turicchia, R. M. Stern, and B. Raj, “ Deriving vocal tract shapes from electromagnetic articulograph data via geometric adaptation and matching , Interspeech 2009, September 2009, Brighton, United Kingdom.
L. Buera, A. Miguel, A. Ortega, E. Lleida, and R. Stern, “Unsupervised training scheme with non-stereo data for empirical feature vector compensation, Interspeech 2009, September 2009, Brighton, United Kingdom.
Y.-H. B. Chiu, B. Raj, and R. M. Stern, “ Toward fusion of feature extraction and acoustic model training: a top-down process for robust speech recognition ,”
Interspeech 2009, September 2009, Brighton, United Kingdom. -
L. Gu and R. M. Stern, “ Speaker segmentation and clustering for sumultaneously-presented speech ,” Interspeech 2009, September 2009, Brighton, United Kingdom.
C. Kim, K. Kumar, B. Raj, and R. M. Stern, “ Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain ,”
Interspeech 2009, September 2009, Brighton, United Kingdom. -
C. Kim and R. M. Stern, “ Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction ,” Interspeech 2009, September 2009, Brighton, United Kingdom.
R. Stern, E. Gouvea, C. Kim, K. Kumar, and H.-M.Park, “ Binaural and multiple-microphone signal processing motivated by auditory perception ,” HSCMA Joint Workshop on Hands-free Speech Communication and Microphone Arrays, May 2008, Trento, Italy.
Z. Al Bawab, B, Raj, and R. M. Stern, “ Analysis-by-synthesis features for speech recognition ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2008, Las Vegas, Nevada.
L. Gu and R. M. Stern, “ Single-channel speech separation based on modulation frequency ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2008, Las Vegas, Nevada.
K. Kumar, and R. M. Stern, “ Environment-invariant compensation for reverberation using linear post-filtering for minimum distortion ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2008, Las Vegas, Nevada.
Y.-H. Chiu and R. M. Stern, “ Analysis of physiologically-motivated signal processing for robust speech recognition ,” Interspeech 2008, September 2008, Brisbane, Australia.
C. Kim and R. M. Stern, “ Robust Signal-to-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis ,” Interspeech 2008, September 2008, Brisbane, Australia.
H.-M. Park and R. M. Stern, “ Missing-feature speech recognition using dereverberation and echo suppression in reverberant environments ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2007, Honolulu, Hawaii.
K. Kumar, T. Chen, and R. M. Stern, “ Profile view lip reading ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2007, Honolulu, Hawaii.
R. M. Stern, E. Gouvea, and G. Thattai, “’ Polyaural’ array processing for automatic speech recognition in degraded environments ,” Proc. Interspeech 2007, August 2007, Antwerp, Belgium.
M. L. Seltzer and R. M. Stern, “ Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments ,” IEEE Trans. on Audio, Speech, and Language Processing, 14 (6): 2109-2121, November 2006.
R. M. Stern, DeL. Wang, and G. Brown, “ Binaural sound localization ,” Chapter in Computational Auditory Scene Analysis, G. Brown and DeL. Wang, Eds., Wiley/IEEE Press, 2006.
R. M. Stern, C. Trahiotis, and A. Ripepi, “ Fluctuations in amplitude and frequency enable interaural delays to foster the identification of speech-like stimuli ,” Chapter in Dynamics of Speech Production and Perception, P. Divenyi et al., Eds., IOS Press, 2006.
H.-M. Park and R. M. Stern, “ Spatial separation of speech sgnals using continuously-variable masks estimated from comparisons of zero crossings ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2006, Toulouse, France.
W. Kim and R. M. Stern, “ Band-independent mask estimation for missing-feature reconstruction ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2006, Toulouse, France.
C. Kim, Y.-H. Chiu, and R. M. Stern, “ Physiologically-motivated synchrony-based processing for robust automatic speech recognition ,” Interspeech 2006, September 2006, Pittsburgh, Pennsylvania.
B. Narayanaswamy, R. Gangadharaiah, and R. M. Stern, “ Voting for two speaker segmentation ,” Interspeech 2006, September 2006, Pittsburgh, Pennsylvania.
B. Raj and R. M. Stern, “ Missing-Feature Methods for Robust Automatic Speech Recognition ,” IEEE Signal Processing Magazine, 22 (5):101-116, September 2005.
N.S. Kim, W. Lim, and R. M. Stern, “Feature compensation based on switching linear dynamic model,” IEEE Signal Processing Letters, 12 (6): 473-476, June, 2005.
W. Kim, R. M. Stern, and H. Ko, “ Environment-Independent Mask Estimation for Missing Feature Reconstruction ,” Proc. Eurospeech-2005 September, 2005, Lisbon, Portugal.
B. Raj, M. L. Seltzer, and R. M. Stern, “ Reconstruction of Missing Features for Robust Speech Recognition ,” Speech Communication Journal 43(4): 275-296, September 2004.
M. L. Seltzer, B. Raj, and R. M. Stern, “ A Bayesian Framework for Spectrographic Mask Estimation for Missing Feature Speech Recognition ,” Speech Communication Journal 43(4): 379-393, September 2004.
M. L. Seltzer, B. Raj, and R. M. Stern, “ Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition ,” IEEE Trans. on Speech and Audio Processing, 12(5): 489-498, September 2004.
R. M. Stern, “ Signal Separation Motivated by Human Auditory Perception: Applications to Automatic Speech Recognition ,” in Speech Separation by Humans and Machines, P. Divenyi, Ed., Springer-Verlag, 2004.
Y. Obuchi, N. Hataoka, and R. M. Stern, “ Normalization of Time-Derivative Parameters for Robust Speech Recognition in Small Devices ,” IEICE Transactions on Information and Systems 87-D(4): 1004:1011, April 2004.
X. Li and R. M. Stern, “Feature Generation Based on Maximum Normalized Acoustic Likelihood for Improved Speech Recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2004, Montreal, Quebec.
B. Raj, R. Singh, and R. M. Stern, “On Tracking Noise with Linear Dynamical System Models,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2004, Montreal, Quebec.
M. L. Seltzer and R. M. Stern, “Parameter Sharing in Subband Likelihood-Maximizing Beamforming for Speech Recognition using Microphone Arrays,” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2004, Montreal, Quebec.
X. Li and R. M. Stern, “Parallel Feature Generation Based on Maximum Normalized Acoustic Likelihood for Improved Combination Performance,” International Conference on Spoken Language Processing, October, 2004, Jeju Island, Korea.
B. Raj and R. Singh, “Classifier-Based Non-Linear Projection for Adaptive Endpointing of Continuous Speech,” Computer Speech and Language 17(1):5-26, January 2003.
M. L. Seltzer, and B. Raj, “ Speech Recognizer Based Filter Optimization for Microphone Array Processing ”, IEEE Signal Processing Letters 10(3):69-71, March 2003.
M. Seltzer and R. Stern, “Subband Parameter Optimization of Microphone Arrays for Speech Recognition in Reverberant Environments,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2003, Hong Kong.
X. Li and R. Stern, “Training of Stream Weights for the Decoding of Speech using Parallel Feature Streams,” IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2003, Hong Kong.
X. Li and R. M. Stern, “Feature Generation Based on Maximum Classification Probability for Improved Speech Recognition,” Proc. Eurospeech-2003 September, 2003, Geneva, Switzerland.
J. P. Nedel and R. M. Stern, “Duration Normalization and Hypothesis Combination for Improved Spontaneous Speech Recognition,” Proc. Eurospeech-2003 September, 2003, Geneva, Switzerland.
Y. Obuchi and R. M. Stern, “Normalization of Time-Derivative Parameters using Histogram Equalization,” Proc. Eurospeech-2003 September, 2003, Geneva, Switzerland.
R. Singh, B. Raj, and R. M. Stern, “ Automatic Generation of Subword Units for Speech Recognition Systems ,” IEEE Transactions on Speech and Audio Processing, 10(2): 89-99, 2002.
R. Singh, R. M. Stern, and B. Raj, “Signal and Feature Compensation Methods for Robust Speech Recognition,” Chapter in CRC Handbook on Noise Reduction in Speech Applications, Gillian Davis, Ed. CRC Press, 2002.
R. Singh, B. Raj, and R. M. Stern, “Model Compensation and Matched Condition Methods for Robust Speech Recognition,” Chapter in CRC Handbook on Noise Reduction in Speech Applications, Gillian Davis, Ed. CRC Press, 2002.
M. L. Seltzer, B. Raj, and R. M. Stern, “Speech Recognizer-Based Microphone Array Processing for Robust Hands-Free Speech Recognition,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2002, Orlando, Florida.
X. Li, R. Singh, and R. M. Stern, “Lattice Combination for Improved Speech Recognition,” Proc. of the International Conference of Spoken Language Processing, September, 2002, Denver, Colorado.
J. M. Huerta and R. M. Stern. “ Distortion-Class Modeling for Robust Speech Recognition under GSM RPE-LTP Coding ,” Speech Communication Journal, 34:213-225.
R. Singh, M. L. Seltzer, B. Raj, and R. M. Stern, “Speech in Noisy Environments: Robust Automatic Segmentation, Feature Extraction, and Hypothesis Combination,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake City, Utah.
J. P. Nedel and R. M. Stern, “Duration Normalization for Improved recognition of Spontaneous and Read Speech via Missing Feature Methods,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake City, Utah.
D. P. W. Ellis, R. Singh, and S. Sivadas, “Tandem Acoustic Modeling in Large-Vocabulary Recognition,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., May, 2001, Salt Lake City, Utah.
M. L. Seltzer and B. Raj, “Calibration of Microphone Arrays for Improved Speech Recognition,” Proc. Eurospeech-2001 September, 2001, Aalborg, Denmark.
B. Raj, M. L. Seltzer, and R. M. Stern, “Robust Speech Recognition: The Case for Restoring Missing Features,” Proc. of the Workshop on Consistent and Reliable Acoustic Cues, September, 2001, Aalborg, Denmark.
S.-J. Doh and R. M. Stern, “ Using Class Weighting in Inter-Class MLLR ,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
J. M. Huerta and R. M. Stern, “Instantaneous Distortion-Based Weighted Acoustic Modeling for Robust Recognition of Coded Speech,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
J. P. Nedel, R. Singh, and R. M. Stern, “Automatic Subword Unit Refinement for Spontaneous Speech Recognition via Phoneword Splitting,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
J. P. Nedel, R. Singh, and R. M. Stern, “Phone Transition Acoustic Modeling: Application to Speaker Independent and Spontaneous Speech Systems,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
B. Raj, M. L. Seltzer, and R. M. Stern, “Reconstruction of Damaged Spectrographic Features for Robust Speech Recognition,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
M. L. Seltzer, B. Raj, and R. M. Stern, “Classifier-Based Mask Estimation for Missing Feature Methods of Robust Speech Recognition,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
R. Singh, B. Raj, and R. M. Stern, “Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition,” Proc. of the International Conference of Spoken Language Processing, October, 2000, Beijing, China.
S.-J. Doh and R. M. Stern, “Inter-Class MLLR for Speaker Adaptation,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., June, 2000, Istanbul, Turkey. ( Poster )
R. Singh, B. Raj, and R. M. Stern, “Automatic Generation of Phone Sets and Lexical Transcriptions,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., June, 2000, Istanbul, Turkey.
M. Ravishankar, R. Singh, B. Raj, R. M. Stern, “The 1999 CMU 10X Real Time Broadcast News Transcription System,” Proc. NIST Speech Transcription Workshop, May, 2000, College Park, Maryland.
S.-J. Doh and R. M. Stern, “Weighted principal component MLLR for speaker adaptation,” Proc. of Automatic Speech Recognition and Understanding Workshop (ASRU 99), Colorado, USA, 1999. ( Poster )
R. Singh, B. Raj and R. M. Stern, “Automatic Clustering And Generation of Contextual Questions For Tied States In Hidden Markov Models,” Proc. of the ICASSP., Phoenix, Arizona, March, 1999.
J. M. Huerta and R. M. Stern, “Distortion-Class Weighted Acoustic Modeling for Robust Recognition under GSM RPE-LTP Coding” , Proc. of the International Symposium on Robust Speech Recognition, Tampere, Finland, June, 1999.
R. Singh, B. Raj, and R. M. Stern, “Domain Adduced State Tying for Cross-domain Acoustic Modelling,” Proc. Eurospeech-99, September, 1999, Budapest, Hungary.
J. M. Huerta, S. J. Chen, and R. M. Stern, “The 1998 Carnegie Mellon University Sphinx-3 Spanish Broadcast News Transcription System”,
Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, March, 1999, Herndon, Virginia.
P. J. Moreno, B. Raj, and R. M. Stern. “ Data-Driven Environmental Compensation for Speech Recognition: A Unified Approach ,” Speech Communication , 24: 267-85, 1998.
J. M. Huerta and R. M. Stern, “Speech Recognition From GSM Codec Parameters,” Proc. of the International Conference on Spoken Language Processing, Sydney, Australia, November, 1998.
B. Raj, R. Singh, and R. M. Stern, “Inference of Missing Spectrographic Features for Robust Speech Recognition,” Proc. of the International Conference on Spoken Language Processing, Sydney, Australia, November, 1998.
R. M. Stern, B. Raj, and P. J. Moreno, (1997). “Compensation for Environmental Degradation in Automatic Speech Recognition,” Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, April, 1997, Pont-au-Mousson, France, pp. 33-42.
M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, “Automatic Segmentation, Classification and Clustering of Broadcast News Audio,” Proc. of the Speech Recognition Workshop (DARPA), Chantilly, VA, Feb. 1997.
J. M. Huerta, E. Thayer, M. Ravishankar, and R. M. Stern, “The Development of the 1997 CMU Spanish Broadcast News Transcription System,” Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, February, 1998, Landsdowne, Virginia.
E. Gouvêa, and R. M. Stern, “Speaker Normalization Through Formant-Based Warping Of The Frequency Scale,” Proc. of the EUROSPEECH, 1997.
B. Raj, E. Gouvêa, and R. M. Stern, “Vector Polynomial Approximations For Robust Speech Recognition,” Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-au-Mousson, France, April, 1997.
B. Raj, V. N. Parikh, and R. M. Stern, “The Effects Of Background Music On Speech Recognition Accuracy,” Proc. of the ICASSP, Munich, Germany, April 1997.
J. M. Huerta and R. M. Stern, “Compensation for Environmental and Speaker Variability by Normalization of Pole Locations,” Proc. Eurospeech-97, September, 1997, Rhodes, Greece.
R. M. Stern, A. Acero, F.-H. Liu, and Y. Ohshima, “Signal Processing for Robust Speech Recognition,” Chapter in Speech Recognition, pp. 351-378, C.-H. Lee and F. Soong, Eds., Boston: Kluwer Academic Publishers, 1996.
P. J. Moreno, B. Raj, and R. M. Stern, “A Vector Taylor Series Approach For Environment-Independent Speech Recognition,” Proc. of the ICASSP, Atlanta, GA, May 1996.
B. Raj, E. Gouvêa, P. J. Moreno, and R. M. Stern, “Cepstral Compensation By Polynomial Approximation For Environment-Independent Speech Recognition,” Proc. of the ICSLP, Philadelphia, PA, Oct. 1996.
E. B. Gouvea, P. J. Moreno, B. Raj, T. M. Sullivan, and R. M. Stern, “Adaptation and Compensation: Approaches To Microphone And Speaker Independence In Automatic Speech Recognition,” Proceedings of the ARPA Workshop on Speech Recognition Technology, Harriman, NY, Morgan Kaufmann, D. Pallett, Ed.
U. Jain, M. A. Siegler, S.-J. Doh, E. Gouvea, P. J. Moreno, B. Raj, and R. M. Stern, “Recognition Of Continuous Broadcast News With Multiple Unknown Speakers And Environments,” Proceedings of the ARPA Workshop on Speech Recognition Technology, Harriman, NY, Morgan Kaufmann, D. Pallett, Ed.
P. J. Moreno, B. Raj, E. Gouvêa, and R. M. Stern, “Multivariate-Gaussian-Based Cepstral Normalization for Robust Speech Recognition,” Proc. of the ICASSP, Detroit, Michigan, 1995.
M. A. Siegler, and R. M. Stern, “On the Effects of Speech Rate in Large Vocabulary Speech Recognition Systems,” Proc. of the ICASSP, Detroit, Michigan, 1995.
P. J. Moreno, B. Raj, R. M. Stern, “A Unified Approach to Robust Speech Recognition,” Proc. of Eurospeech-95, Madrid, Spain, September, 1995.
P. J. Moreno, M. A. Siegler, U. Jain, and R. M. Stern, “Continuous Speech Recognition of Large Vocabulary Telephone Quality Speech,” Proc. of the Eighth Spoken Language Systems Technology Workshop, 1995.
P. J. Moreno, U. Jain, B. Raj, and R. M. Stern, “Approaches to Microphone Independence in Automatic Speech Recognition,” Proc. of the Eighth Spoken Language Systems Technology Workshop, 1995.
P. J. Moreno, B. Raj, and R. M. Stern, “Approaches to Environment Compensation in Automatic Speech Recognition,” Proc. 15th International Conference on Acoustics, Trondheim, Norway, Vol. III, pp. 109-112, June, 1995.
Stern, R. M. and Sullivan, T. M. “Robust Speech Recognition Based on Human Binaural Perception,” Proc. of the ATR workshop on A Biological Framework for Speech Perception and Production, Kansai Science City, September, 1994, Reprinted as ATR Technical Report TR-H-121, (1995).
F.-H. Liu, R. M. Stern, A. Acero, and P. J. Moreno, “Environment Normalization for Robust Speech Recognition using Direct Cepstral Comparison,” Proc. of the ICASSP, Adelaide, Australia, 1994.
P. J. Moreno, and R. M. Stern, “Sources of Degradation of Speech Recognition in the Telephone Network,” Proc. of the ICASSP, Adelaide, Australia, 1994.
R. M. Stern, F.-H. Liu, P. J. Moreno, and A. Acero, “Signal Processing for Robust Speech Recognition,” Proc. of the International Conference on Spoken Language Processing, Yokohama, Japan, September, 1994.
N. Hanai, and R. M. Stern, “Robust Speech Recognition in the Automobile,” Proc. of the International Conference on Spoken Language Processing, Yokohama, Japan, September, 1994.
Y. Ohshima and R. M. Stern, “ Environmental Robustness in Automatic Speech Recognition Using Physiologically-Motivated Signal Processing ,” Proc. of the International Conference on Spoken Language Processing, Yokohama, Japan, September, 1994.
F.-H. Liu, P. J. Moreno, R. M. Stern, and A. Acero, “Signal Processing For Robust Speech Recognition,” Proceedings of the Seventh ARPA Workshop on Human Language Technology, Princeton, New Jersey, Morgan Kaufmann, C. J. Weinstein, Ed.
F.-H. Liu, P. J. Moreno, R. M. Stern, and A. Acero, “ Signal Processing For Robust Speech Recognition ,” Proceedings of the ARPA Workshop on Spoken Language Technology, Princeton, New Jersey, March, 1994, R. M. Stern, Ed.
T. M. Sullivan and R. M. Stern, “Multi-Microphone Correlation-Based Processing for Robust Speech Recognition,” Proc. of the ICASSP, Minneapolis, Minnesota, April, 1993.
F.-H. Liu, R. M. Stern, X. Huang, and A. Acero, “Efficient Cepstral Normalization For Robust Speech Recognition,” Proc. of the Sixth ARPA Workshop on Human Language Technology, Princeton, NJ, Morgan Kaufmann, March, 1993.
R. M. Stern, F.-H. Liu, Y. Ohshima, T. M. Sullivan, and A. Acero, “Multiple Approaches to Robust Speech Recognition,” Proc. of the Fifth DARPA Speech and Natural Language Workshop, Harriman, New York, February, 1992.
F.-H. Liu, A. Acero, and R. M. Stern, “Efficient Joint Compensation of Speech for the Effects of Additive Noise and Linear Filtering,” Proc. of the ICASSP, San Francisco, CA, March, 1992.
R. M. Stern, F.-H. Liu, Y. Ohshima, T. M. Sullivan, and A. Acero, “Multiple Approaches to Robust Speech Recognition,” Proc. of the ICSLP, 1992.
A. Acero, and R. M. Stern, “Robust Speech Recognition by Normalization of the Acoustic Space,” Proc. of the ICASSP, Toronto, Ontario, 1991.
W. A. Rozzi and R. M. Stern, “Fast Estimation of Mean Vectors using Adaptive Filtering,” Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, Ontario, pp. 865-868, 1991.
A. Acero, and R. M. Stern, “Environmental Robustness in Automatic Speech Recognition,” Proc. of the ICASSP, Albuquerque, New Mexico, 1990.
A. Acero, and R. M. Stern, “Toward Microphone-Independent Spoken Language Systems,” Proceedings of the DARPA Speech and Natural Language Workshop , Hidden Valley, PA, R. M. Stern , Ed., Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1990.
A. Acero, and R. M. Stern, “Acoustical Pre-Processing for Robust Spoken Language Systems,” Proc. First International Conference on Spoken Language Processing, pp. 1121-1124, Kobe, Japan, November, 1990.
“Classic” robust papers (pre-1990)
Original description of extended maximum a posteriori probability (EMAP) speaker adaptation:
R. M. Stern and M. J. Lasry, “ Dynamic Speaker Adaptation for Feature-Based Isolated Letter Recognition ,” IEEE Trans. on Acoustics, Speech, and Signal Processing 35: 751-763, 1987.
M. J. Lasry and R. M. Stern, “A Posteriori Estimation of Correlated Jointly Gaussian Mean Vectors,” IEEE Trans. on Pattern Anal. and Mach. Intel. 6: 530-535, 1984.
M. J. Lasry and R. M. Stern, “Unsupervised Adaptation to New Speakers in Feature-Based Letter Recognition,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., San Diego, California, May, 1984.
R. M. Stern and M. J. Lasry (1983). “Dynamic Speaker Adaptation for Isolated Letter Recognition Using MAP Estimation,” Proc. IEEE Conf. on Acoustics, Speech, and Sig. Proc., Boston, Massachusetts, May, 1983.
External publications
Creating a Mexican Spanish Version of the CMU Sphinx-III Speech Recognition System Armando Varela , Heriberto Cuayuhuitl , and Juan Arturo Nolazco-Flores, CIARP 2003, LNCS 2905, pp. 251–258, 2003
Foad Hamidi Using interactive objects for speech intervention ACM SIGACCESS Accessibility and Computing archive Issue 96 (January 2010), Pages: 28-31. ISSN:1558-2337
Gupta, K. Owens, J.D. Three-layer optimizations for fast GMM computations on GPU-like parallel processors Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on