Influence of binary mask estimation errors on robust speaker identification

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Influence of binary mask estimation errors on robust speaker identification

详细信息查看全文

作者：Tobias May ; ^{tobmay@elektro.dtu.dk}
关键词：Speaker identification ; Missing data ; Ideal binary mask ; Estimated binary mask ; Bounded marginalization ; Full marginalization ; Direct masking
刊名：Speech Communication
出版年：2017
出版时间：March 2017
年：2017
卷：87
期：Complete
页码：40-48
全文大小：1024 K
卷排序：87

文摘

Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating unreliable components in the feature extraction front-end, full marginalization (FM) discards unreliable feature components in the classification back-end. Finally, bounded marginalization (BM) can be used to combine the evidence from both reliable and unreliable feature components during classification. Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable and unreliable feature components in the context of automatic speaker identification (SID). A systematic evaluation under ideal and non-ideal conditions demonstrated that the robustness to errors in the binary mask varied substantially across the different missing-data strategies. Moreover, full and bounded marginalization showed complementary performances in stationary and non-stationary background noises and were subsequently combined using a simple score fusion. This approach consistently outperformed individual SID systems in all considered experimental conditions.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700