Dysarthric Speech Recognition Error Correction Using Weighted Finite State Transducers Based on Context

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Dysarthric Speech Recognition Error Correction Using Weighted Finite State Transducers Based on Context–Dependent Pronunciation Variation

详细信息查看全文

作者：Woo Kyeong Seong (1) wkseong@gist.ac.kr
Ji Hun Park (1) jh_park@gist.ac.kr
Hong Kook Kim (1) hongkook@gist.ac.kr
关键词：context ; dependent pronunciation variation modeling – dysarthric speech recognition – weighted finite state transducers – error correction
刊名：Lecture Notes in Computer Science
出版年：2012
出版时间：2012
年：2012
卷：7383
期：1
页码：475-482
全文大小：218.3 KB
参考文献：1. Haines, D.: Neuroanatomy: an Atlas of Structures, Sections, and Systems. Lippingcott Williams and Wilkins, Hagerstown (2004)
2. Hasegawa–Johnson, M., Gunderson, J., Perlman, A., Huang, T.: HMM–based and SVM–based recognition of the speech of talkers with spastic dysarthria. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Toulouse, France, pp. 1060–1063 (2006)
3. Rudzicz, F.: Towards a noisy–channel model of dysarthria in speech recognition. In: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Los Angeles, CA, pp. 80–88 (2010)
4. Poock, G.K., Lee Jr., W.C., Blackstone, S.W.: Dysarthric speech input to expert systems, electronic mail, and daily job activities. In: Proceedings of the American Voice Input/Output Society Conference, Alexandria, VA, pp. 33–43 (1987)
5. Kotler, A.L., Tam, C.: Effectiveness of using discrete utterance speech recognition software. Augmentative and Alternative Communication 18(3), 137–146 (2002)
6. Rosen, K., Yampolsky, S.: Automatic speech recognition and a review of its functioning with dysarthric speech. Augmentative and Alternative Communication 16(1), 48–60 (2000)
7. Polur, P.D., Miller, G.E.: Effect of high–frequency spectral components in computer recognition of dysarthric speech based on a Mel–cepstral stochastic model. Journal of Rehabilitation Research and Development 42(3), 363–371 (2005)
8. Hosem, J.P., Jakobs, T., Baker, A., Fager, S.: Automatic speech recognition for assistive writing in speech supplemented word prediction. In: The 11th Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 2674–2677 (2010)
9. Rudzicz, F.: Correcting error in speech recognition with articulatory dynamics. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 60–68 (2010)
10. Morales, S.O.C., Cox, S.J.: Modeling errors in automatic speech recognition for dysarthric speakers. EURASIP Journal on Advances in Signal Processing, Article ID 308340, 14 pages (2009)
11. Fransen, T.J., Pye, D., Foote, J., Renals, S.: WSJCAM0: a British English speech corpus for large vocabulary continuous speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing, Detroit, MI, pp. 81–84 (1995)
12. Menendez–Pidal, X., Polikoff, J.B., Peters, S.M., Leonzio, J.E., Bunnell, H.T.: The Nemours database of dysarthric speech. In: Proceedings of International Conference on Spoken Language Processing, Philadelphia, PA, pp. 1962–1965 (1996)
13. ETSI Standard Document ES 201 108.: Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front–end Feature Extraction Algorithm; Compression Algorithms (2000)
14. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
15. Robinson, T.: British English Example Pronunciation Dictionary (BEEP). Cambridge University, Cambridge (1997)
16. Mohri, M., Pereira, F., Riley, M.: Weighted finite–state transducers in speech recognition. Computer Speech and Language 16(1), 69–88 (2002)
作者单位：1. School of Information and Communications, Gwangju Institute of Science and Technology (GIST), 1 Oryong–dong, Buk–gu, Gwangju, 500-12 Korea
ISSN：1611-3349

文摘

In this paper, we propose a dysarthric speech recognition error correction method based on weighted finite state transducers (WFSTs). First, the proposed method constructs a context–dependent (CD) confusion matrix by aligning a recognized word sequence with the corresponding reference sequence at a phoneme level. However, because the dysarthric speech database is too insufficient to reflect all combinations of context–dependent phonemes, the CD confusion matrix can be underestimated. To mitigate this underestimation problem, the CD confusion matrix is interpolated with a context–independent (CI) confusion matrix. Finally, WFSTs based on the interpolated CD confusion matrix are built and integrated with a dictionary and language model transducers in order to correct speech recognition errors. The effectiveness of the proposed method is demonstrated by performing speech recognition using the proposed error correction method incorporated with the CD confusion matrix. It is shown from the speech recognition experiment that the average word error rate (WER) of a speech recognition system employing the proposed error correction method with the CD confusion matrix is relatively reduced by 13.68% and 5.93%, compared to those of the baseline speech recognition system and the error correction method with the CI confusion matrix, respectively.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700