Ensemble acoustic modeling in Automatic Speech Recognition.

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Ensemble acoustic modeling in Automatic Speech Recognition.

详细信息

作者：Chen ; Xin.
学历：Doctor
年：2011
导师：Zhao, Yunxin,eadvisor
毕业院校：University of Missouri
ISBN：9781267431653
CBH：3515866
Country：USA
语种：English
FileSize：5313354
Pages：121

文摘

Combining multiple acoustic models to improve the overall acoustic model quality is a young and promising direction in Automatic Speech Recognition ASR). Previous works on acoustic modeling of speech signals such as Random Forests RFs) of Phonetic Decision Trees PDTs) has produced significant improvements in recognition accuracy. In this dissertation, several new approaches of using data sampling to construct an Ensemble of Acoustic Models EAM) for speech recognition are proposed. A straightforward method of data sampling is Cross Validation CV) data partition. In the direction of improving inter-model diversity within an EAM for speaker independent speech recognition, we propose Speaker Clustering SC) based data sampling and develop two algorithms, including the Likelihood based Speaker Clustering LSC) and speaker model Distance based Speaker Clustering DSC). In the direction of improving base model quality as well as inter-model diversity, we further investigate the effects of several successful techniques of single model training in speech recognition on the proposed ensemble acoustic models, including Cross Validation Expectation Maximization CVEM), Discriminative Training DT), and Multiple Layer Perceptron MLP) features. We also propose using an ensemble of Multiple models with Different Mixture Sizes MDMS) to improve EAM quality. We have evaluated the proposed methods on TIMIT speaker-independent phoneme recognition task as well as on a telemedicine automatic captioning task of speaker-dependent continuous speech recognition. The proposed EAMs have led to significant improvements in recognition accuracy over conventional Hidden Markov Model HMM) baseline systems, and the integration of ensemble acoustic models with CVEM, DT and MLP has also significantly improved the accuracy performances of CVEM, DT, and MLP based single model systems. We further investigated the largely unstudied factor of inter-model diversity, and proposed several methods to explicit measure inter-model diversity. We demonstrate a positive relation between enlarging inter-model diversity and increasing EAM quality. HMM-based acoustic models built from data sampling EAM are generally very large, especially when a large number of models or full covariance matrices are used for Gaussian densities. Therefore, compacting the acoustic model to a reasonable size for practical applications while maintaining a reasonable performance is needed. Toward this goal, in this dissertation, we discuss and investigate several distance measures and algorithms for clustering methods. The distance measures include Entropy, KL, Bhattacharyya, Chernoff and their weighted versions. For clustering algorithms, besides the conventional greedy agglomerative clustering, algorithms such as N-Best distance Refinement NBR), K-step LookAhead KLA), Breadth-First Search BFS) are proposed. Experiments on the TIMIT task have shown that in comparison with the original EAM model, the compacted models using the clustering methods can maintain the model accuracy, while the size of the compacted model is largely decreased. Experiments in compacting EAM on a Pashto ASR task have shown that the proposed clustering methods can lead to better quality than the conventional clustering methods. Unlike the implicit PDT based states tying that has been used in most ASR systems as well as in the recent RF based PDTs, explicit PDT EPDT) state tying that allows Phoneme data Sharing PS) is considered for its potential capability in capturing pronunciation variations. The ensemble approach of combining multiple acoustic models is applied to the EPDT, where a combination of explicit PDT and implicit PDT models has been investigated to reduce phone confusions.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700