TY - GEN
T1 - Voice driven applications in non-stationary and chaotic environment
AU - Kwan, C.
AU - Li, X.
AU - Lao, D.
AU - Deng, Y.
AU - Ren, Z.
AU - Raj, B.
AU - Singh, R.
AU - Stern, R.
PY - 2005
Y1 - 2005
N2 - Automated operations based on voice commands will become more and more important in many applications, including robotics, maintenance operations, etc. However, voice command recognition rates drop quite a lot under non-stationary and chaotic noise environments. In this research, we tried to significantly improve the speech recognition rates under non-stationary noise environments. First, 298 Navy acronyms have been selected for automatic speech recognition. Data sets were collected under 4 types of non-stationary noisy environments: factory, buccaneer jet, babble noise in a canteen, and destroyer. Within each noisy environment, 4 levels (5 dB, 15 dB, 25 dB, and clean) of Signal-to-Noise Ratio (SNR) were introduced to corrupt the speech. Second, a new algorithm to estimate speech or no speech regions has been developed, implemented, and evaluated. Third, extensive simulations were carried out. It was found that the combination of the new algorithm, the proper selection of language model and a customized training of the speech recognizer based on clean speech yielded very high recognition rates, which are from 80% to 90% for the four different noisy conditions. Fourth, extensive comparative studies have also been carried out.
AB - Automated operations based on voice commands will become more and more important in many applications, including robotics, maintenance operations, etc. However, voice command recognition rates drop quite a lot under non-stationary and chaotic noise environments. In this research, we tried to significantly improve the speech recognition rates under non-stationary noise environments. First, 298 Navy acronyms have been selected for automatic speech recognition. Data sets were collected under 4 types of non-stationary noisy environments: factory, buccaneer jet, babble noise in a canteen, and destroyer. Within each noisy environment, 4 levels (5 dB, 15 dB, 25 dB, and clean) of Signal-to-Noise Ratio (SNR) were introduced to corrupt the speech. Second, a new algorithm to estimate speech or no speech regions has been developed, implemented, and evaluated. Third, extensive simulations were carried out. It was found that the combination of the new algorithm, the proper selection of language model and a customized training of the speech recognizer based on clean speech yielded very high recognition rates, which are from 80% to 90% for the four different noisy conditions. Fourth, extensive comparative studies have also been carried out.
KW - Non-stationary
KW - Speech recognition
KW - Voice commands
UR - https://www.scopus.com/pages/publications/33947699852
U2 - 10.1109/robio.2005.246250
DO - 10.1109/robio.2005.246250
M3 - 会议稿件
AN - SCOPUS:33947699852
SN - 0780393155
SN - 9780780393158
T3 - 2005 IEEE International Conference on Robotics and Biomimetics, ROBIO
SP - 127
EP - 132
BT - 2005 IEEE International Conference on Robotics and Biomimetics, ROBIO
PB - IEEE Computer Society
T2 - 2005 IEEE International Conference on Robotics and Biomimetics, ROBIO
Y2 - 5 July 2005 through 9 July 2005
ER -