多语音分离语音增强方法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语音是人类进行信息传递和交流的重要载体,语音的质量能否得到保证不仅会影响到人耳的听觉效果,还会影响到语音处理系统的各个环节。在实际环境中,由于各种类型干扰的存在,语音信号往往会受到各种各样的污染,造成语音质量的明显下降。语音增强则是以去除各类干扰为出发点,尽可能地恢复出原始的纯净语音信号。针对不同类型的干扰,会有不同的语音增强方法。用于去除语音干扰的语音分离技术是当前语音增强研究领域的热点之一。
     本文主要研究多语音分离语音增强问题,具体包括以下三方面内容:
     (1)基于独立分量分析的语音信号盲源分离方法。在观测信号个数不少于源信号个数的情况下,采用独立分量分析技术可以很好地解决盲源分离问题,有效去除语音干扰。独立分量分析技术的核心在于对解混矩阵(混合矩阵的逆矩阵)的求解,源信号则是通过解混矩阵与观测信号向量直接相乘求得。本文在独立分量分析快速算法的基础上,研究了更为有效的改进型快速算法和结合语音信号短时平稳性的改进算法,以提高解混矩阵的精度及源信号的恢复质量。
     (2)基于聚类的欠定盲辨识方法。对于观测信号个数少于源信号个数的欠定盲源分离问题,独立分量分析方法不再适用,此时估计系统混合矩阵需要利用源信号的稀疏性。本文研究了基于聚类的欠定盲辨识算法,给出一种采用迭代自组织数据分析技术来估计混合矩阵的方法,并采用去除野点的预处理步骤及逐步去除类中最大偏差样本点的后置处理步骤,进一步提高了算法的稳定性和混合矩阵的估计精度。
     (3)欠定语音信号分离的逐层分离方法。在欠定盲源分离问题中,通常利用源信号的稀疏性,采用统计学的方法来分离源信号。由于语音信号本身的稀疏性还不够理想,并不严格满足正交不重叠条件,因此分离出的源信号之间存在相互干扰和音乐噪声比较明显。本文通过对混合矩阵进行逐次变换,采用从混合信号中逐步消去各个源信号,并由各混合信号中逐次产生的零值点来构造多层二值掩蔽模板的方法,将源信号进行逐层分离,在一定程度上抑制了分离出的源信号之间的相互干扰和音乐噪声,提高了源信号的分离质量。
     计算机仿真实验结果表明了上述方法的有效性。
Speech is very important for information transmission and communication of human. The quality of speech is not only influence the human hearing but also the other steps in speech processing system. For the interference pollution in real world, speech quality always become bad. Speech enhancement is based on removing the interference in speech to recover the original pure speech. Different method could be designed to eliminate different kinds of interference. The multiple speech separation technology to remove the speech interference is one of the focus in the field of speech enhancement.
     This paper study on the separation method of multiple speech signals and mainly include three aspects as follows:
     (1) The blind source separation method of speech signal based on independent component analysis. When the number of observed signals which is linear mixed by sources is not less than the number of sources, the method of independent component analysis could resolve the blind source separation problem effectively and eliminate the noise of speech interference well. The key of independent component analysis method is to calculate the demixing matrix. Then sources could be recovered from the observed signals by multiplying the demixing matrix with the observed signal vector. This paper focus on the fast independent component analysis algorithm, and studied two more efficient version of the algorithm to improve the accuracy of the demixing matrix and the quality of recovered sources.
     (2) The underdetermined blind identification method based on clustering. When the number of observed signals is less than the number of sources, called as the underdetermined condition, the independent component analysis method is not suitable for the separation any more. It is necessary to utilize the sparse of signals to evaluate the mixing matrix for further separation. This paper studied the underdetermined blind identification algorithm based on clustering, and give a method by using the Iterative self organizing data analysis techniques algorithm to obtain the mixing matrix. Then, a pre-proceeding step and a post-proceeding step are proposed to improve the robust of the algorithm and the accuracy of the evaluated mixing matrix.
     (3) The underdetermined speech separation by a step-wise method. In the underdetermined condition, methods based on statistics are always adopted for the separation by using the sparse of signals. However, there are still some mutual interference and music noise in the separated sources, because speech sources do not satisfy the W-disjoint orthogonality condition strictly in time frequency domain which led to the overlapping in some extent between sources. this paper utilize the mixing matrix that get from some clustering method, and cancel the sources one by one from each mixed signal to produce corresponding zero value points in the mixed signals. Then, construct multiple binary mask from zero points to extract the disjoint or overlapped sources from the mixture and separate them step by step. The interference and music noise of the separated signals is depressed in some extent by this method, and the quality of the separation become better.
     Experimental result reveal the efficiency of the methods above in this paper.
引文
[1]胡航.语音信号处理[M].哈尔滨:哈尔滨工业大学出版社,2005.
    [2]赵力.语音信号处理[M].北京:机械工业出版社,2003.
    [3]Cho J,Crishnamurthy A.Speech enhancement using microphone array in moving vehicle environment[C].Proceedings of Intelligent Vehicles Symposium,Columbus,USA,2003:366-371.
    [4]Frost O L.An algorithm for linearly constrained adaptive array processing[J].Proceedings of the IEEE,1972,60(8):926-935.
    [5]Griffiths L J,Jim C W.An alternative approach to linearly constrained adaptive beamforming[J].IEEE Transactions on Antennas and Propagation,1982,30(1):27-34.
    [6]Hoshuyama O,Sugiyama A,Hirano A.A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters[J].IEEE Transactions on Signal Processing,1999,47(I0):2677-2684.
    [7]Gannot S,Cohen I.Speech enhancement based on the general transfer function GSC and postfiltering[J].IEEE Transactions on Speech and Audio Processing,2004,12(6):561-571.
    [8]马建仓,牛交龙,陈海洋.盲信号处理[M].北京:国防工业出版社,2006.
    [9]Hyvarinen A.Survey on independent component analysis[J].Neural Computing Surveys,1999,2:94-128.
    [10]Yilmazand O,Rickard S.Blind separation of speech mixtures via time-frequency masking[J].IEEE Transactions on Signal Processing,2004,52(7):1830-1847.
    [11]杨福生,洪波.独立分量分析的原理与应用[M].北京:清华大学出版社,2006.
    [12]Herault J,Jutten C.Space or time adaptive signal processing by neural network models[C].Proceedings of the Conference on Neural Networks in Computing,Snowbird,Utah,USA,1986:206-211.
    [13]Linsker R.An Application of the principle of maximum information preservation to linear systems[C].Nerual Information Processing Systems Conference,Denver,USA,1989:186-194.
    [14]Cardoso J F.Source separation using higher order moments[C].International Conference on Acoustics,Speech,and Signal Processing,Glasgow,UK,1989:2109-2112.
    [15]Gaeta M,Lacoume J L.Source separation without prior knowledge:the maximum likelihood solution[C].Fifth European Signal Processing Conference,Barcelona,Spain,1990:621-624.
    [16]ComonP. Independent component analysis, A new concept[J]. Signal Processing, 1994,36: 287-314.
    [l7]Cichocki A, Unbehauen R, Moszczynski L, et al. A new on-line adaptive learning algorithm for blind separation of source signal[C]. International Symposium on Artificial Neural Networks, Tainan, Taiwan, 1994: 406-411.
    [18]Bell A J, Sejnowski T J. An information maximization approach to blind separation and blind deconvolution[J]. Neural Computation, 1995, 7: 1129-1159.
    [l9]Cardoso J F. Infomax and maximum likelihood for blind source separation[J]. IEEE Signal Processing Letters, 1997, 4(4): 112-114.
    [20]Belouchrani A, Karim A M, Cardoso J F, et al. A blind source separation technique using second-order statistics[J]. IEEE Transactions on Signal Processing, 1997,45(2): 434-444.
    [21]Hyvarinen A, Oja E. A fast fixed-point algorithm for independent component analysis[J]. Neural Comput, 1997, 9(7): 1483-1492.
    [22]Hyvarinen A. Fast and robust fixed-point algorithms for independent component analysis[J]. IEEE Transactions on Neural Network, 1999,10(3): 626-634.
    [23]Hyvarinen A, Karhunen J, Oja E. Independent component analysis[M]. New York: John Wiley and Sons, 2001.
    [24]Bingham E, Hyvarinen A. A fast fixed-point algorithm for independent component analysis of complex Valued Signals[J]. Neural System, 2000, 10(1): 1-8.
    [25]Stone J V. Blind source separation using temporal predictability[J]. Neural Computation, 2001, 13(7): 1559-1574.
    [26]Cheung Y M, Liu H L. A new approach to blind source separation with global optimal property[C]. Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence, Grindelwald, Switzerland , 2004: 137—141.
    [27]Tichavsky P, Koldovsky Z, Oja E. Asymptotic performance of the FastICA algorithm for independent component analysis and its iprovements[C]. IEEE Statistical Signal Processing Workshop, Bordeaux, France, 2005: 1084-1089.
    [28]Lewicki M S, Sejnowski T J. Learing overcomplete representations[J]. Neural Computation, 1998, 12: 337-365.
    [29]Jourjine A, Rickard S, Yilmaz O. Blind separation of disjoint orthogonal signals:demixing N sources from 2 mixtures[C]. IEEE International Conference on Acoustics,Speech, and Signal Processing, Istanbul, Turkey, 2000: 2985-2988.
    [30]Thomas J, Deville Y, Hosseini S. Differential fast fixed-ponit algorithms for underdetermined instantaneous or convolutive partial blind source separation[J],IEEE Transactions on Signal Processing, 2007, 55(7): 3717-3729.
    [31]Mitianoudis N,Stathaki T.Batch and online underdetermined source separation using laplacian mixture models[J].IEEE Transactions on Audio,Speech and Language Processing,2007,15(6):1818-1832.
    [32]Pedersen M S,Wang D L,Larsen J,Kjems U.Two-microphone separation of speech mixtures[J],IEEE Transactions on Neural Networks,2008,19(3):475-492.
    [33]Lathauwer L D,Castaing J.Blind identification of underdetermined mixtures by simultaneous matrix diagonalization[J],IEEE Transactions on Signal Processing,2008,56(3):1096-1105.
    [34]刘小华.语音分离和语音增强方法研究[D]:(硕士学位论文).大连:大连理工大学,2007.
    [35]马晓红.传声器阵列语音增强中关键技术的研究[D]:(博士学位论文).大连:大连理工大学,2006.
    [36]胡广书.数字信号处理:理论、算法与实现[M].北京:清华大学出版社,2003.
    [37]Tichavsky P,Koldovsky Z,Oja E.Performance analysis of the FastICA algorithm and cramer-rao bounds for linear independent component analysis[J].IEEE Transactions on Signal Processing,2006,54(4):1189-1203.
    [38]Koldovsky Z,Tichavsky P,Oja E.Efficient variant of algorithm FastICA for independent component analysis attaining the cramer-rao lower bound[J].IEEE Transactions on Neural Networks,2006,17(5):1265-1277.
    [39]Koldovsky Z,Malek J,Tichavsky P,et al.Blind separation of piecewise stationary nongaussian sources[J].Signal Processing,2009,89:2570-2584.
    [40]Pham D T,Garat P,Blind separation of mixture of independent sources through a quasi-maximum likelihood approach[J],IEEE Transactions on Signal Processing,1997,45(7):1712-1725.
    [41]Zibulevsky M,Kisilev P,Zeevi Y Y,et al.Blind source separation via multinode sparse representation[C].Advances in Neural Information Processing Systems,Vancouver,Canada,2002:1049- 1056.
    [42]Zibulevsky M,Zeevi Y Y.Extraction of a source from multichannel data using sparse decomposition[J].Neurocomputing,2002,49:163-173.
    [43]O'Grady P D,Pearlmutter B A.Hard-LOST:Modified k-means for oriented lines[C].Irish Signals and Systems Conference,Belfast,Northern Ireland,2004:247-252.
    [44]O'Grady P D,Pearlmutter B A.Soft-LOST:EM on a mixture of oriented lines[C].Fifth International Conference on Independent Component Analysis,Granada,Spain,2004:430-436.
    [45]武森,高学东,巴斯蒂安 M.高维稀疏聚类知识发现[M].北京:冶金工业出版社,2003.
    [46]钟珞.模式识别[M].武汉:武汉大学出版社,2006.
    [47]Rickard S, Yilmaz O. On the approximate W-disjoint orthogonality of speech[C].International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA,2002: 529-532.