Prosody modeling in spoken language systems.

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Prosody modeling in spoken language systems.

详细信息

作者：Jeon ; Je Hun.
学历：Doctor
年：2011
导师：Liu, Yang,eadvisorBusso, Carlosecommittee memberHansen, John H. L.ecommittee memberNg, Vincentecommittee member
毕业院校：The University of Texas
Department：Computer Science
ISBN：9781267165558
CBH：3494541
Country：USA
语种：English
FileSize：4813447
Pages：134

文摘

In spoken utterances, prosody is encoded in the form of pitch accent, intonation, and rhythm, and conveys linguistic and paralinguistic information such as emphasis, intent, attitude, and emotion of a speaker. Humans listening to speech with natural prosody are able to understand the content with low cognitive load and high accuracy. Automatic extraction of prosodic information is necessary for machines to process speech with human levels of proficiency. This dissertation focuses on two kinds of approaches to use prosody information, symbolic and direct modeling of prosody. We first investigate symbolic modeling of prosody---symbolic annotation of prosodic events, such as pitch accent and prosodic phrase boundary tones. We develop acoustic and lexical/syntatic prosodic models, and combine the two models to improve the performance of symbolic annotation of prosodic events. We adopt a semi-supervised approach to utilize unlabeled data for prosodic event annotation with a co-training algorithm. We propose a novel labeling and selection scheme for the co-training algorithm in order to address the compatible and uncorrelated assumptions that are often not true in real data. Furthermore, we utilize such symbolic modeling of prosody to help improve automatic speech recognition performance. Second, as a direct modeling approach, we present a novel technique to detect a users interest level in conversations using prosodic cues in combination with other sources of information. Since a listener provides feedback in dialog, we expect that the interest level is dependent on not only how the person says something represented by prosody information), but also what the person said represented by lexical information). We develop a decision-level combination system using these two information sources and demonstrate improved performance than relying on a single information source. We believe this dissertation will contribute to our understanding of prosody in spoken language, and advance the use of prosody in spoken language processing towards the goal of human-like processing of speech by machines.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700