ILATalk: a new multilingual text-to-speech synthesizer with machine learning

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

ILATalk: a new multilingual text-to-speech synthesizer with machine learning

详细信息查看全文

作者：Saleh M. Abu-Soud
关键词：Text ; to ; speech ; Synthesizer ; Multilingual ; NLP ; Phonemes ; Inductive learning ; ILA
刊名：International Journal of Speech Technology
出版年：2016
出版时间：March 2016
年：2016
卷：19
期：1
页码：55-64
全文大小：774 KB
参考文献：Abu-Soud, S. (1997). “A framework for integrating decision support systems and expert systems with machine learning”. In Proceeding of the 10th International Conference on Industrial and Engineering Applications of AI and ES.
Hassan M. H. & Abu-Soud, S. (2000). “A parallel inductive learning algorithm”. AMSE Journal, France, Dec 2000.
Abu-Soud, S. M., & Al-Ibrahim, A. (2009). DRILA: A distributed relational inductive learning algorithm. WSEAS Transactions on Computers, 8(6), 988–999.
Abu-Soud, S. M., & Tolun, M. R. (1999a). “DCL: a disjunctive learning algorithm for rule extraction”. In Multiple approaches to intelligent systems (pp. 669–678). Berlin Heidelberg: Springer.
Abu-Soud, S. M., & Tolun, M. R. (1999b) “A disjunctive concept learning algorithm for rule generation”. In Applied Informatics-Proceedings.
Bakiri, G. & Dietterich, T. G. (1993). “Performance comparison between human engineered and machine learned letter-to-sound rules for English: A machine learning success story”. In Proceedings of the 18th International Conference on the Applications of Computer and Statistics to Science and Society, Cairo, Egypt.
Bill, B. (1990). “The mothertongue: English and how it got that way”.
Chen, S. H., Hwang, S. H., & Wang, Y.-R. (1998). An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Transactions on Speech and Audio Processing, 6(3), 226–239.CrossRef
Dietterich, T. G., Hild, H. & Bakiri, G. (1990). “A comparative study of ID3 and backpropagation for English text-to-speech mapping”. ML.
Dutoit, T. (1997). High-quality text-to-speech synthesis: An overview. Journal of Electrical and Electronics Engineering Australia, 17, 25–36.
Golding, A. R., & Rosenbloom, P. S. (1996). Improving accuracy by combining rule-based and case-based reasoning. Artificial Intelligence, 87(1), 215–254.CrossRef
Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Communication, 18(3), 281–290.CrossRef
http://www.eupedia.com/forum/threads/29850-Number-of-phonemes-(vowels-consonants)-by-language-in-Europe . Accessed June 28, 2015.
Huang, X., et al. (1996). “Whistler: A trainable text-to-speech system”. In Proceedings of the Fourth International Conference on Spoken Language, 1996. ICSLP 96. Vol. 4. IEEE.
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning, an artificial intelligence approach (pp. 463–482). Tioga: Palo Alto, CA.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing (Vol. 1). Cambridge, MA: MIT Press.
Sasirekha, D., & Chandra, E. (2012). Text to speech: A simple tutorial. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, 2(1), March 2012.
Sejnowski, T. L., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145–168.
Stas, T., David, M., Slava, Sh., & Zvi, K. (2010). A hybrid text-to-speech system that combines concatenative and statistical synthesis units. CCIT Report #777, Irwin and Joan Jacobs center for communication and information technologies, Haifa 3200, Nov 2010.
Tolun, M. R., & Abu-Soud S. M. (1998). An Inductive Learning Algorithm for Production Rule Discovery. The International Journal of Expert Systems with Applications, 14(3), 361–370.CrossRef
Tolun, M. R., Sever, H., Uludag, M., & Abu-Soud, S. M. (1999). ILA-2: An inductive learning algorithm for knowledge discovery. Cybernetics & Systems, 30(7), 609–628.CrossRef
作者单位：Saleh M. Abu-Soud (1)

1. Department of Software Engineering, Princess Sumaya University for Technology, Amman, 11941, Jordan
刊物类别：Engineering
刊物主题：Signal,Image and Speech Processing
Social Sciences
Artificial Intelligence and Robotics
出版者：Springer Netherlands
ISSN：1572-8110

文摘

In this paper, a new multilingual text-to-speech system based on inductive learning has been developed. This system is called ILATalk. It is composed of three phases: the analysis phase, learning phase, and synthesis phase. It can accept any language; all what is needed is to store the data set that contains the training examples that are generated from a representative and selected subset of words from the required language in addition to the associated phonemes of the language in data tables to be used as input to the system. The system has been thoroughly tested with many sets of experiments with various parameters and sizes, and compared with two known approaches: ID3 and NN Backpropagation. The results obtained showed that ILATalk produces correct phonemes with high accuracy and out-performs these algorithms in most cases. Keywords Text-to-speech Synthesizer Multilingual NLP Phonemes Inductive learning ILA

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700