用户名: 密码: 验证码:
Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit
详细信息    查看全文
  • 关键词:Speech synthesis ; Unit selection ; Expressive speech synthesis ; Emotion ; Prosody ; Facial animation
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:10011
  • 期:1
  • 页码:190-197
  • 全文大小:540 KB
  • 参考文献:1.The semaine project. http://​www.​semaine-project.​eu/​
    2.Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43(2), 155–177 (2015)CrossRef
    3.Aylett, M.P., Pidcock, C.J.: The cerevoice characterful speech synthesiser SDK. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS (LNAI), vol. 4722, pp. 413–414. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-74997-4_​65 CrossRef
    4.Aylett, M.P., Pidcock, C.J.: UK patent GB2447263A: Adding and controlling emotion in synthesised speech (2012)
    5.Aylett, M.P., Potard, B., Pidcock, C.J.: Expressive speech synthesis: synthesising ambiguity. In: SSW8, pp. 133–138, Barcelona, Spain, August 2013
    6.Buchholz, S., Latorre, J.: Crowdsourcing preference tests, and how to detect cheating. In: Proceedings of Interspeech, pp. 3053–3056 (2011)
    7.Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., Schröder, M.: FEELTRACE: an instrument for recording perceived emotion in real time. In: ITRW on speech and emotion, pp. 19–24 (2000)
    8.Gobl, C., Chasaide, A.N., et al.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40(1), 189–212 (2003)CrossRef MATH
    9.Hartholt, A., Traum, D., Marsella, S.C., Shapiro, A., Stratou, G., Leuski, A., Morency, L.-P., Gratch, J.: All together now. In: Aylett, R., Krenn, B., Pelachaud, C., Shimodaira, H. (eds.) IVA 2013. LNCS (LNAI), vol. 8108, pp. 368–381. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-40415-3_​33 CrossRef
    10.Hofer, G.O., Richmond, K., Clark, R.A.: Informed blending of databases for emotional speech synthesis. In: Proceedings of Interspeech (2005)
    11.Mattheyses, W., Latacz, L., Verhelst, W.: Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis. Speech Commun. 55(7), 857–876 (2013)CrossRef
    12.Plutchik, R.: The Psychology and Biology of Emotion. Harper Collins College Publishers, New York (1994)
    13.Schlosberg, H.: A scale for the judgement of facial expressions. J. Exp. Psychol. 29(6), 497–510 (1941)CrossRef
    14.Schröder, M.: Emotional speech synthesis: a review. In: Proceedings Eurospeech, vol. 01, pp. 561–564 (2001)
    15.Schröder, M.: Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: André, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds.) ADS 2004. LNCS (LNAI), vol. 3068, pp. 209–220. Springer, Heidelberg (2004). doi:10.​1007/​978-3-540-24842-2_​21 CrossRef
    16.Schröder, M., Grice, M.: Expressing vocal effort in concatenative synthesis. In: Proceedings of 15th International Conference of Phonetic Sciences, pp. 2589–2592 (2003)
    17.Taylor, P.A., Black, A., Caley, R.: The architecture of the festival speech synthesis system. In: SSW3. pp. 147–151. Jenolan Caves, Australia (1998)
    18.Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using psola technique. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1992, vol. 1, pp. 145–148. IEEE (1992)
  • 作者单位:Blaise Potard (19)
    Matthew P. Aylett (19) (20)
    David A. Baude (19)

    19. CereProc Ltd., Edinburgh, UK
    20. University of Edinburgh, Edinburgh, UK
  • 丛书名:Intelligent Virtual Agents
  • ISBN:978-3-319-47665-0
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:10011
文摘
Emotional expression is a key requirement for intelligent virtual agents. In order for an agent to produce dynamic spoken content speech synthesis is required. However, despite substantial work with pre-recorded prompts, very little work has explored the combined effect of high quality emotional speech synthesis and facial expression. In this paper we offer a baseline evaluation of the naturalness and emotional range available by combining the freely available SmartBody component of the Virtual Human Toolkit (VHTK) with CereVoice text to speech (TTS) system. Results echo previous work using pre-recorded prompts, the visual modality is dominant and the modalities do not interact. This allows the speech synthesis to add gradual changes to the perceived emotion both in terms of valence and activation. The naturalness reported is good, 3.54 on a 5 point MOS scale.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700