用户名: 密码: 验证码:
Prosody modeling in spoken language systems.
详细信息   
  • 作者:Jeon ; Je Hun.
  • 学历:Doctor
  • 年:2011
  • 导师:Liu, Yang,eadvisorBusso, Carlosecommittee memberHansen, John H. L.ecommittee memberNg, Vincentecommittee member
  • 毕业院校:The University of Texas
  • Department:Computer Science
  • ISBN:9781267165558
  • CBH:3494541
  • Country:USA
  • 语种:English
  • FileSize:4813447
  • Pages:134
文摘
In spoken utterances, prosody is encoded in the form of pitch accent, intonation, and rhythm, and conveys linguistic and paralinguistic information such as emphasis, intent, attitude, and emotion of a speaker. Humans listening to speech with natural prosody are able to understand the content with low cognitive load and high accuracy. Automatic extraction of prosodic information is necessary for machines to process speech with human levels of proficiency. This dissertation focuses on two kinds of approaches to use prosody information, symbolic and direct modeling of prosody. We first investigate symbolic modeling of prosody---symbolic annotation of prosodic events, such as pitch accent and prosodic phrase boundary tones. We develop acoustic and lexical/syntatic prosodic models, and combine the two models to improve the performance of symbolic annotation of prosodic events. We adopt a semi-supervised approach to utilize unlabeled data for prosodic event annotation with a co-training algorithm. We propose a novel labeling and selection scheme for the co-training algorithm in order to address the compatible and uncorrelated assumptions that are often not true in real data. Furthermore, we utilize such symbolic modeling of prosody to help improve automatic speech recognition performance. Second, as a direct modeling approach, we present a novel technique to detect a users interest level in conversations using prosodic cues in combination with other sources of information. Since a listener provides feedback in dialog, we expect that the interest level is dependent on not only how the person says something represented by prosody information), but also what the person said represented by lexical information). We develop a decision-level combination system using these two information sources and demonstrate improved performance than relying on a single information source. We believe this dissertation will contribute to our understanding of prosody in spoken language, and advance the use of prosody in spoken language processing towards the goal of human-like processing of speech by machines.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700