基于I-VECTOR的与文本无关的说话人识别研究
本文选题:语音信号 + GMM-UBM模型 ; 参考:《兰州理工大学》2017年硕士论文
【摘要】:说话人识别作为生物识别的一种,因其使用便捷性、非交互式等优势逐渐被人们接受和使用,并成为生物识别领域的研究热点。与文本无关的说话人识别是从语音信号中提取出能反映个人特征的信息,来完成对话者身份的辨认和确认。近年来,随着说话人识别技术的发展,说话人识别逐渐走向社会应用,但实际使用时,由于实际环境的影响、语音采集设备的多样性以及话者语音的长短等影响,说话人识别在识别精度上还存在一些问题。本文针对在实际使用时,测试话者的短语音导致识别精度不高以及环境失配等问题,从补偿的角度,研究了高斯模型、i-vector模型以及高斯线性鉴别性分析(GPLDA)模型。首先,本文对说话人识别的模型进行了介绍,探讨了说话人识别的预处理和特征提取,利用美尔频率倒谱系数提取话者的特征,针对训练和测试语音不足的问题,构建了GMM-UBM模型,对其原理和建模进行了相关的阐述,并分析了该系统的优缺点,通过实验验证了模型的混合度选取,研究了反映说话人动态和静态特征的美尔频率差分特征对说话人识别的影响,通过实验分析了该系统的性能。其次,针对GMM-UBM跨信道性能差的特点,在因子分析的基础上,利用身份认证矢量i-vector构建了基于i-vector的说话人确认系统。针对信道失配等问题,利用线性鉴别性分析和类内协方差归一化等补偿手段对系统进行补偿,并分析各补偿方式对系统的影响。同时利用实验分析了i-vector维数对说话人识别系统的影响,并选取了合适的特征维数。最后,针对目前与文本无关的说话人识别,基于不定长短语音的说话人确认的识别精度低等问题,本文采用高斯线性鉴别行分析(GPLDA)模型,针对将i-vector转化到PLDA模型时,对i-vector进行长度归一化,导致对长度归一化后的i-vector的后端协方差不能进行精确计算,影响系统的鲁棒性。本文提出利用全变量空间的列向量归一化来代替对i-vector的长度归一化,并对提出的方法进行验证和实验,结果表明该方法可以提高系统的鲁棒性,且识别率没有降低。
[Abstract]:As a kind of biometrics, speaker recognition has been accepted and used gradually because of its advantages of convenience and non-interaction, and has become a hotspot in the field of biometrics. Text-independent speaker recognition is to extract information that reflects personal characteristics from the speech signal to identify and confirm the identity of the interlocutor. In recent years, with the development of speaker recognition technology, speaker recognition has gradually moved towards social application, but in practical use, due to the influence of actual environment, the diversity of speech acquisition equipment and the length of speaker speech, etc. There are still some problems in the recognition accuracy of speaker recognition. In order to solve the problems of low recognition accuracy and environmental mismatch caused by the short speech of the speaker in practical use, this paper studies the Gao Si model and the Gao Si linear discriminant analysis model from the angle of compensation. Firstly, this paper introduces the model of speaker recognition, discusses the preprocessing and feature extraction of speaker recognition, extracts the speaker's features by using the Mel frequency cepstrum coefficient, and aims at the problem of insufficient training and testing speech. The GMM-UBM model is constructed, its principle and modeling are expounded, the advantages and disadvantages of the system are analyzed, and the selection of the mixing degree of the model is verified by experiments. The effect of Mel frequency difference feature, which reflects the dynamic and static characteristics of the speaker, on speaker recognition is studied, and the performance of the system is analyzed through experiments. Secondly, aiming at the poor cross-channel performance of GMM-UBM, based on factor analysis, a speaker confirmation system based on i-vector is constructed by using identity authentication vector i-vector. In order to solve the problem of channel mismatch, linear discriminant analysis and intra-class covariance normalization are used to compensate the system, and the influence of each compensation method on the system is analyzed. At the same time, the influence of i-vector dimension on speaker recognition system is analyzed by experiments, and the appropriate feature dimension is selected. Finally, aiming at the low recognition accuracy of text-independent speaker recognition and speaker recognition based on variable length speech, this paper adopts Gao Si linear line discriminant analysis (Gao Si) model, aiming at transforming i-vector into PLDA model. The length normalization of i-vector results in the failure to calculate the back-end covariance of the normalized length i-vector, which affects the robustness of the system. In this paper, the column vector normalization in full variable space is proposed to replace the length normalization of i-vector, and the proposed method is verified and tested. The results show that the proposed method can improve the robustness of the system and the recognition rate is not reduced.
【学位授予单位】:兰州理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
【参考文献】
相关期刊论文 前4条
1 胡群威;吴明辉;李辉;;利用时长信息提高说话人确认系统的鲁棒性[J];微型机与应用;2016年11期
2 许云飞;杨海;周若华;颜永红;;高斯PLDA在说话人确认中的应用及其联合估计[J];自动化学报;2014年06期
3 刘华平;李昕;徐柏龄;姜宁;;语音信号端点检测方法综述及展望[J];计算机应用研究;2008年08期
4 李桦,安钢,樊新海;短时能频值在语音端点检测中的应用[J];测试技术学报;1999年01期
相关硕士学位论文 前10条
1 李锐;基于因子分析的说话人分离技术研究[D];中国科学技术大学;2016年
2 胡群威;话者确认中信道和时长失配补偿研究[D];中国科学技术大学;2016年
3 赵灵歌;文本无关的说话人识别研究[D];重庆大学;2016年
4 陈晨;I-VECTOR说话人识别中基于偏最小二乘的总变化空间估计方法[D];哈尔滨工业大学;2015年
5 卓著;基于信道补偿技术的说话人确认研究[D];中国科学技术大学;2015年
6 陈炜;指纹识别系统的研究应用[D];东南大学;2015年
7 曾祺;文本无关的多说话人确认研究[D];电子科技大学;2014年
8 钟林鹏;说话人识别系统中的语音信号处理技术研究[D];电子科技大学;2013年
9 徐红梅;与文本无关的闭集声纹识别系统研究[D];哈尔滨理工大学;2013年
10 向权;基于GMM的声纹识别系统研究[D];哈尔滨理工大学;2012年
,本文编号:1879264
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1879264.html