说话人识别系统的研究

英文题名：Study of Speaker Recognition System
作者：刘永红
论文级别：硕士
学科专业名称：电力系统及其自动化
中文关键词：语音识别 ; 能频值 ; 美尔倒谱差分 ; 线性预测倒谱差分 ; 动态时间归整 ; 高斯混合概率模型
英文关键词：Speech Recognition ; Energy Frequency Value ; Mel Frequency Cepstrum Coefficient Difference ; Linear Prediction Cepstrum Difference ; Dynamic Time Warping ; Gaussian Mixture Model
学位年度：2003
导师：肖建 ; 贾俊波
学科代码：080802
学位授予单位：西南交通大学
论文提交日期：2003-04-01

摘要

说话人识别是指通过说话人的语音来自动识别说话人的身份，它在许多领域内有良好的应用前景。本文通过分析语音特征参数的特点和说话人识别的基本方法，提出了以美尔倒谱差分和线性预测差分为特征，通过动态时间归整算法来识别的文本相关说话人辨认系统。
     本文从语音信号的预处理开始分析，对语音信号进行了端点检测，滤除了语音信号的无声段，为语音特征参数的提取提供了有用的语音段。文中还比较了双门限语音端点检测方法与能频值端点检测算法的性能，实验证实能频值端点检测算法能很好的区分含噪语音端点。
     本文应用全极点模型，提取语音信号的线性预测系数，并推导出其倒谱系数，获得线性预测倒谱差分，用以描述说话人声道的动态变化。利用听觉频率非线性特性的美尔倒谱作为语音识别的特征参数，来辨识说话人提供的输入口令。
     本文通过MATLAB语音处理工具箱，提取输入语音的特征参数，采用动态时间归整算法来匹配参考模板和测试模板，获得了很高的识别率。本文考虑到系统的安全性，采用美尔倒谱系数识别密码，线性预测倒谱差分识别说话人声道动态变化的双重判决方法，为系统应用在高度机密场合提供了可能，具有运算速度快，模板更新容易，计算量小，差错率低等优点。
     为了比较各种识别算法，本文还开发了文本无关说话人识别系统，以美尔倒谱及其差分为特征，建立高斯混合说话人模型，取得了较高的识别率，可应用在识别率要求不是太高的场合。
Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information include in speech signals. It has well application prospects in many fields. By analyzing speech characteristic parameters and the basis methods of speaker recognition, we choose MFCC and LPCC's difference to be the speech characteristic parameters. Using DTW to recognize text-dependent speech, we have developed a speaker identification system in this paper.
    Before picking up the speech signal characteristic parameters, the voice signal is undergoing pretreatment. In this phase, we should find the signal's endpoint and filter the speech silence segment in order to provide useful speech segment. We give comparison of the two endpoint examination methods: double-gate thresh-hold method and energy-frequency-value method.Experiments show that latter can partition the endpoint of noise speech better.
    In this paper, we use full pole model to obtain speech signal LPC, then deduce it's LPCC, and we use the LPCC difference to describe speaker's track dynamic movement. Also, since MFCC represent hearing frequency nonlinear characteristic,we utilize MFCC to be another speak recognition characteristic parameter to distinguish the input passwords.
    In this paper we utilize MATLAB Voice Box to abstract speech's characteristic parameter, use DTW to matching reference model with test model and obtain very high recognition rate. Considering system security, we adopt MFCC to recognize password and LPCC to represent speaker track dynamic movement. The double decrees enable it applying in high secret situations. The system has many merit such as the quick operation velocity, easy model update, less calculate quantity and low error rate.
    In order to compare the differece of recognition algorithm,we develop text-independent speaker recognition system.we use MFCC and its difference as


    the feature,make Gaussian Mixture Model and acquire higher recognition rate.It can be applied in low recognition rate needed situation.

引文

[1]朱民雄，闻新，黄健群，周露编著．计算机语音技术．北京航空航天大学出版社，2002，p242-274
    [2]易克初，田斌，付强．语音信号处理Speech Signal Processing，国防工业出版社，2000，249-264
    [3]Zilovic M S, Ramachandran R P, Mammone R J. Speaker Identification Based on the Use of Robust Cepstral Features Obtained from Pole-Zero Transfer Functions. IEEE Trans on Speech and Audio Processing, 1998, 6(3):260-267
    [4]林宝成，陈永彬．基于ARMA模型的汉语讲话者识别．声学学报，1998，23(3)：229-234
    [5]Zilovic M, Ramachandran R, Mammone R. A Fast Algorithm for Finding the Adaptive Component Weighted Cepstrum for Speaker Recognition. IEEE Tans on Speech and Audio Processing, 1997, 5(1): 84-86
    [6]Hayakawa S, Itakura F. Influence of Noise on the Speaker Recognition Performance Using the Higher Frequency Band. In. Proceedings of IEEE Int Conf on Acoustics, Speech, and Signal Processing, Detroit, Michigan, USA,1995,1:321-324
    [7]Garcia A A, Mammone R J. Channel-Robust Speaker Identification Using Modified-Mean Cepstral Mean Normalization with Frequency Warping. In:Proceedings of IEEE Int Conf on Acoustics, Speech, and Signal Processing,Phoenix, Arizona, USA, 1999,325-328
    [8]胡航．语音信号处理，哈尔滨工业大学出版社，2000．5
    [9]L.R. Rabiner and R.W. Schafer, Digital Signal Processing. Prentice-Hall, Inc,1978.
    [10]杨行峻，迟惠生．语音信号数字处理，电子工业出版社，1995
    [11]拉宾纳LR，谢弗著R W，朱雪龙等译．语音信号数字处理．北京：科学出版社，1983：90
    [12]杨崇林，李雪耀，孙羽．强噪声背景下汉语语音端点检测和音节分割．哈尔滨工程大学学报，1997，18(5)：28-32


    [13]胡光锐，韦小东．基于倒谱特征的带噪语音端点检测．电子学报，2000，28(10)：95-97
    [14]陈四根．基于熵函数的语音端点检测．声学与电子工程，2001，总第61期，2001(1)：28-30
    [15]黄昌宁，夏莹．语言信息处理专论．清华大学出版社，广西科学技术出版社，1996．4
    [16]Chow Y.L, Donham M.O.,et al, BYBLOS:The BBN Continuous Speech Recognition System,In Proc. Of IEEE ICASSP-87,pp89-92,Apr., 1987
    [17]X. Wang. An Estimate on the Output Mean Value of the Median Filter. IEEE Trans., vol. SP-47, 1999.
    [18]何英，何强．MATLAB扩展编程，清华大学出版社，2002．6
    [19]Murthy H, Bcaufays F, Heck L, et al. Robust Text-Independent Speaker Identification over Telephone Channels. IEEE Trans on Speech and Audio Processing, 1999, 7(5): 554-568
    [20]Hermansky H. Perceptual Linear Predictive (PLP) Analysis for Speech.Journal of the Acoustical Society of America, 1990,87(4): 1738-1752
    [21]Fakotakis N, Sirigos J. A High Performance Text-Independent Speaker Identification System Based on Vowel Spotting and Neural Nets. In: Proceedings of IEEE Int Conf on Acoustics, Speech and Signal Processing, Atlanta, GA,USA, 1996, 661-664
    [22]张红．基于听觉感知机理的语音特征研究．西南交通大学研究生博士学位论文，1998．
    [23]S.Umesh, L.Cohen, D.Nelson. Frequency Warping and the Mel Scale. IEEE Signal Processing Letters. 2001,Vol 9(3): 104:107
    [24]甄斌，吴玺宏，刘志敏，迟惠生．语音识别和说话人识别中各倒谱分量的相对重要性．北京大学学报(自然科学版)，2001，37(3)：371-378
    [25]宁飞，陈频．说话人识别的几种方法．电声技术，2001，总第198期，2001(12)，9-14
    [26]张炜，胡起秀，吴文虎．距离加权矢量量化文本无关的说话人识别．清华大学学报(自然科学版)，1997，37(3)：20-23
    [27]尉洪，周浩，杨鉴．基于矢量量化的组合参数法说话人识别．云南大学

    学报(自然科学版)，2002，24(2)：96-100
    [28]C.W. Che, Q.G. Yuk. An HMM Approach to Text-Prompted Speaker Verifi-cation. The 1996 IEEE International Conference On Acoustics, Speech and Signal Processing Conference Proceedings, 1996(7-10):673-676
    [29]M.Birnbaum, K.L.Brown, S.Bardenhagen. Text-independent Speaker Identification Using Fenonic Speaker Markov Modeling. The 1996 IEEE International Couterence on Acoustics, Speech and Signal Processing Conference Proceedings, 1996, 677-680
    [30]林平澜，王仁华．动态HMM及其在说话人识别中的应用，信号处理，1993，9(4)：250-256
    [31]李灿伟，杨震．基于HMM的说话人确认系统的研究．南京邮电学院学报(自然科学版)，2001，21(2)：52-55
    [32]岳喜才，叶大田．文本无关的说话人识别：综述．模式识别与人工智能，2001，14(2)：194-200
    [33]K. Markov, S. Nakagawa, Text-Independent speaker identification on TIMIT database[A].Proceedings,Acous. Soc. Jap. [C], 1995.83-84
    [34]Douglas A. Reynolds, nd Richard C. Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models, IEEE Trans on Speech and Audio Processing, 1995, 3(1):72-83
    [35]张怡颖，朱小燕，张钹．一种新的说话人确认方法．软件学报，1999，10(4)：372-376
    [36]牟晓隆，胡起秀，吴文虎．与文本无关的复合策略说话人辨识系统．清华大学学报(自然科学版)，1997，37(3)：16-19
    [37]陈永彬．语音信号处理．上海交通大学出版社，1991
    [38]Yi Hu, Philipos C. Loizou. A Subspace Approach for Enhancing Speech Corrupted by Colored Noise. IEEE SIGNAL PROCESSING LETTERS.2002,9(7):204-206
    [39]Israel Cohen. Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator. IEEE SIGNAL PROCESSING LETTERS. 2002,9(4): 113-116
    [40]Israel Cohen, Baruch Berdugo. Noise Estimation by Minima Controlled

    Recursive Averaging for Robust Speech Enhancement. IEEE SIGNAL PROCESSING LETTERS. 2002,9(1): 12-15
    [41]Robert W. Morris, Mark A. Clements. Modification of Formants in the Line Spectrum Domain. IEEE SIGNAL PROCES SING LETTERS,2002,9(1): 19-21
    [42]杨澄宇，赵文，杨鉴．基于高斯混合模型的说话人确认系统．计算机应用，2001，21(4)：7-11
    [43]沈亚强．低信噪比语音信号端点检测和自适应滤波．电子测量与仪器学报，2001，15(1)：32
    [44]王宏，向大威．基于长时平均谱的“文本无关”话者识别．声学技术，2002，21(1-2)：59-62
    [45]朱璇，刘加，刘润生．语音识别技术新热点—语音识别专用芯片．半导体集成电路，2002，2：26-27
    [46]双中．语音识别芯片及应用．贵州科学．2002，20(4)：118-120
    [47]于向东，索秀云，翟建仁．基于模糊聚类的语音识别．模糊系统与数学，2002，16(1)：75-79
    [48]孙怡，刘兴立．单片机语音合成开发．微计算机应用，1998，19(3)：165-167
    [48]郭敏，陈健．用DSP实现人工神经网络语音识别．电声技术，1996，8：2-6
    [49]贺兴时，刘宇．BP神经网络算法在数字识别中的应用．西北纺织工学院学报，2000，14(4)：356-362
    [50]董湘君，黄智伟．基于隐马尔可夫模型的语音单字识别研究．南华大学学报(理工版)，2002，16(2)：76-79
    [51]梅晓丹，孙圣和．基于小波变换的静音与语音分割新算法．哈尔滨工业大学学报，2002，34(3)：408-411
    [52]王艳琴，梁钊，蒙山．分布式语音识别的前端处理及相关标准．电声技术，2002，第5期总第203期：4-7