基于融合特征与高斯混合模型的说话人识别研究

发布时间：2018-02-01 12:48

本文关键词： 说话人识别特征提取融合特征高斯混合模型　出处：《东华理工大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着移动互联网的快速发展,方便快捷的生物认证技术越来越受到重视。说话人识别又称为声纹识别,是生物认证技术的一种,其凭借更便捷、更安全、更经济的优势迎来了发展的契机,广泛应用于通信网络、商业交易、司法鉴定以及军事安全等领域。说话人识别的关键在特征提取与模型训练上,其常用的语音特征有线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)以及感知线性预测(PLP),其中MFCC与PLP的识别性能要优于LPCC,但MFCC与PLP分别作为单一特征用于说话人识别仍不能满足一些领域的高精确性要求,而高斯混合模型(GMM)由于其方法简单、识别性能优良运用最为广泛,但简单的初始参数选择方法使其有不稳定、精确性不足等缺陷,基于此本文提出一种新的融合特征PLP-MFCC以及改进的GMM来提升识别的性能,主要工作内容如下:首先,介绍说话人识别中常用的特征参数LPCC、MFCC、PLP以及它们的提取过程,在分析F比与D比两种特征参数评价方法后,通过F比和D比选出MFCC与PLP中的部分特征进行融合,得到新的融合特征PLP-MFCC;其次,研究GMM的建模方法,GMM训练模型的过程就是其参数估计的过程,而参数估计中初始参数的选择会影响训练模型的精确性,初始参数的选取通常采用随机法或K均值算法,在分析它们存在的一些问题后,将一种改进的K均值算法运用到GMM参数估计中对GMM进行改进;最后,在Matlab平台上构建了基于高斯混合模型的说话人识别系统,实验采用标准的TIMIT语音库,将融合特征PLP-MFCC与MFCC、PLP进行对比实验,PLP-MFCC的识别率相比于MFCC、PLP分别提升了4.9%、3.8%,再将改进前后的GMM进行对比实验,改进后的GMM相比于改进前的GMM识别率提升了2.3%,实验结果表明,本文提出的融合特征以及改进的GMM对于提升说话人识别系统的性能有一定的效果。
[Abstract]:With the rapid development of mobile Internet, more and more attention has been paid to the convenient and fast biometric authentication technology. Speaker recognition, also known as voiceprint recognition, is a kind of biometric authentication technology, which is more convenient and more secure. The more economical advantage ushered in the development opportunity, widely used in the communication network, the commercial transaction, the judicial appraisal and the military security and so on domain. The speaker recognition key lies in the characteristic extraction and the model training. The commonly used speech features include LPC Cepstrum, Mel Frequency Cepstrum (MFCC) and perceptual Linear Predictive (PLP). The recognition performance of MFCC and PLP is better than that of LPCC, but MFCC and PLP as a single feature for speaker recognition can not meet the requirements of high accuracy in some fields. Gao Si hybrid model (GMMM) is widely used because of its simple method, but the simple initial parameter selection method has some defects, such as instability, inaccuracy and so on. Based on this, a new fusion feature PLP-MFCC and an improved GMM are proposed to improve the performance of recognition. The main work is as follows: first. This paper introduces the commonly used feature parameters LPCC- MFCC-PLP and their extraction process in speaker recognition. The evaluation methods of F / D ratio and D / F ratio are analyzed. A new fusion feature PLP-MFCC is obtained by selecting some features of MFCC and PLP by F / D ratio. Secondly, the study of GMM modeling method and the process of training model is the process of parameter estimation, and the choice of initial parameters in parameter estimation will affect the accuracy of the training model. The selection of initial parameters is usually based on random method or K-means algorithm. After analyzing their problems, an improved K-means algorithm is applied to GMM parameter estimation to improve GMM. Finally, a speaker recognition system based on Gao Si hybrid model is constructed on Matlab platform. The standard TIMIT language database is used in the experiment, and the feature PLP-MFCC and MFCC are fused. Compared with PLP, the recognition rate of PLP-MFCC was increased by 4.9% and 3.8% respectively. Then the GMM before and after the improvement was compared. Compared with the improved GMM, the improved GMM can improve the recognition rate by 2.3%. The experimental results show that. The proposed fusion feature and improved GMM can improve the performance of speaker recognition system.
【学位授予单位】：东华理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.34

【相似文献】