CVX-Optimized Beamforming and Vector Taylor Series Compensation with German ASR Employing Star-Shaped Microphone Array
详细信息   
摘要
This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a star-shaped microphone array and vector Taylor series (VTS) compensation. First, a beamformer yields an enhanced single-channel signal by applying convex (CVX) optimization over three spatial dimensions given the spatio-temporal position of the target speaker as prior knowledge. Then, VTS compensation is applied over the speech features extracted from the temporal signal obtained by the beamformer. Finally, the compensated features are used for speech recognition. Due to a lack of existing resources in German to evaluate the proposed enhancement framework, this paper also introduces a new speech database. In particular, we present a medium-vocabulary German database for microphone array made of embedded clean signals contaminated with real room impulsive responses and mixed in a ‘natural-way with real noises. We show that the proposed enhancement framework performs better than other related systems on the presented database.