说话人识别中改进特征提取算法的研究
本文选题:MFCC 切入点:平滑幅度谱包络 出处:《太原理工大学》2015年硕士论文
【摘要】:说话人识别是广义的语音识别。其基本思想是根据说话人的语音特征来确定说话人的身份。近年来,随着科学技术的不断进步,各领域对说话人识别技术的要求也在不断提高,这使得说话人识别技术面临着很大的难题。一方面,说话人识别所用特征参数会随着说话人的身体状况、情绪特点和说话时所处的环境的变化而变化;另一方面,说话人识别注重的不是语音信号中语义信息,而是信号中的说话人个性特征信息。要想准确的识别说话人的身份,就必须将语义信息和说话人的个性信息准确的分离开。但是目前还没有一种技术能将两者完全分离。本文主要针对这些问题进行了研究。 MFCC参数描述的是信号的谱包络特征,而信号的谱包络主要表征的是说话人的声道特性,忽略了基音频率对特征的影响。针对这一问题,本文提出了一种改进算法,即在提取MFCC参数时,不直接将信号的频谱通过梅尔滤波器组,而是先利用滑动平均滤波器对信号频谱进行平滑,得到信号谱包络的近似表示。再将得到的结果通过梅尔滤波器进行滤波。在此基础上,用多窗频谱估计方法代替Hamming窗的DFT变换来计算信号的频谱,得到一种新的特征参数MTSMFCC。实验表明,,基于MTSMFCC的说话人识别系统,噪声鲁棒性和时间鲁棒性都有所提高。 为了解决单一特征参数在噪声环境下识别率低的问题,本文在原始MFCC的基础上进行了三个方面的融合:1.为了使特征参数能够充分反映语音的动态特性,在原始MFCC的基础上融合了一阶差分参数MFCC,得到参数Fusion1;2.为了充分反映语音的低频信息、中频信息和高频信息,对MFCC、IMFCC和MidMFCC进行了融合,得到参数Fusion2。3.在前两种融合的基础上,对Fusion1和Fusion2进行了融合,得到新的特征参数NMFCC。新参数NMFCC不仅符合人耳的听觉特性,而且包含了语音信号中的低频、中频和高频的信息,能够更全面的反映说话人的个性信息。实验表明,在噪音环境下,新特征参数NMFCC与Fusion1和Fusion2相比,识别率有不同程度的提高。
[Abstract]:Speaker recognition is a generalized speech recognition, whose basic idea is to determine the speaker's identity according to the speaker's speech characteristics. In recent years, with the development of science and technology, the requirements of speaker recognition technology in various fields are also increasing. On the one hand, the characteristic parameters used in speaker recognition will change with the changes of the speaker's physical condition, emotional characteristics and the environment in which he speaks; on the other hand, Speaker recognition focuses not on the semantic information in the speech signal, but on the speaker's personality information in the signal. It is necessary to separate the semantic information from the speaker's personality information accurately, but there is no technology to completely separate the two. This paper mainly focuses on these problems. The MFCC parameter describes the spectral envelope feature of the signal, while the spectral envelope of the signal mainly represents the speaker's channel characteristics, neglecting the influence of pitch frequency on the feature. In order to solve this problem, an improved algorithm is proposed in this paper. That is, when extracting MFCC parameters, the spectrum of the signal is not directly passed through the Mel filter bank, but the signal spectrum is smoothed by the moving average filter. The approximate representation of signal spectrum envelope is obtained. Then the result is filtered by Mel filter. On this basis, the multi-window spectrum estimation method is used instead of the DFT transform of the Hamming window to calculate the signal spectrum. A new feature parameter MTSM MTSMFCC is obtained. The experimental results show that the noise robustness and time robustness of the speaker recognition system based on MTSMFCC are improved. In order to solve the problem of low recognition rate of a single feature parameter in a noisy environment, the fusion of three aspects on the basis of the original MFCC is carried out in this paper. In order to make the feature parameter fully reflect the dynamic characteristics of speech, In order to fully reflect the low frequency information, if information and high frequency information of the speech, the fusion of the first order difference parameter MFCC and the high frequency information of MidMFCC is carried out, and the parameters of fusion 2. 3 are obtained based on the fusion of the first two kinds of fusion, the first order difference parameter MFCC is fused on the basis of the original MFCC, and the parameter Fusion1 / 2 is obtained, which can fully reflect the low frequency, if and high frequency information of the speech. Fusion1 and Fusion2 are fused to obtain a new characteristic parameter NMFCC.The new parameter NMFCC not only accords with the auditory characteristics of human ear, but also contains the information of low frequency, middle frequency and high frequency in speech signal. The experimental results show that the new feature parameter NMFCC can improve the recognition rate in different degrees compared with Fusion1 and Fusion2 in noise environment.
【学位授予单位】:太原理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TN912.34
【参考文献】
相关期刊论文 前10条
1 韩一;王国胤;杨勇;;基于MFCC的语音情感识别[J];重庆邮电大学学报(自然科学版);2008年05期
2 田克平;曾庆宁;;与文本无关说话人识别特征提取的改进[J];电声技术;2008年11期
3 王飒;郑链;;基于Fisher准则和特征聚类的特征选择[J];计算机应用;2007年11期
4 张芸;李昕;郑宇;杨庆涛;;一种基于Fisher准则的说话人识别方法研究[J];兰州大学学报(自然科学版);2007年02期
5 胡政权;曾毓敏;宗原;李梦超;;说话人识别中MFCC参数提取的改进[J];计算机工程与应用;2014年07期
6 鲜晓东;樊宇星;;基于Fisher比的梅尔倒谱系数混合特征提取方法[J];计算机应用;2014年02期
7 张怡然;白静;王力;;基于多窗频谱估计和平滑幅度谱包络的Mel频率倒谱系数(MFCC)改进算法[J];科学技术与工程;2014年19期
8 熊华乔;郑建彬;詹恩奇;汪阳;华剑;;基于说话人模型聚类的说话人识别[J];计算机工程与应用;2014年02期
9 周绍磊;廖剑;史贤俊;;基于Fisher准则和最大熵原理的SVM核参数选择方法[J];控制与决策;2014年11期
10 陶智,葛良;基于减谱法的语音增强和噪声消除的研究[J];苏州大学学报(自然科学);2002年03期
相关博士学位论文 前1条
1 李燕萍;说话人辨认中的特征参数提取和鲁棒性技术研究[D];南京理工大学;2009年
本文编号:1665204
本文链接:https://www.wllwen.com/kejilunwen/wltx/1665204.html