基于SVM的文本无关的说话人辨认技术研究
发布时间:2018-05-21 02:32
本文选题:说话人识别 + 高斯混合模型 ; 参考:《南京邮电大学》2017年硕士论文
【摘要】:语音是人类最有效的交流方式,因为其独特性使其成为说话人识别技术的基本依据。在说话人识别基本框架下,寻找一种区分性强的说话人个性特征以获得更高的系统性能是当前说话人识别领域的研究热点。模型选择和特征提取是说话人识别技术中重点考虑的问题,在确定了模型选择的条件下,说话人识别系统性能的好坏就主要决定于选取何种类型的特征参数。当今数字化时代,寻找一种优越的说话人个性特征具有很好的理论研究意义和现实意义。本文的研究目标是设计能够使说话人识别系统的识别性能提升或系统时间复杂度降低的语音特征。为此重点研究了GMM Supervector在说话人识别系统中的特性,并在此基础上提出了重组超矢量,结合支持向量机的特性分析重组超矢量的可行性;接着研究了近几年热门的深度学习,设计了一个深度神经网络来提取说话人语音的瓶颈特征。本文的主要工作和创新如下:(1)本文介绍了说话人识别的基本框架,主要包括语音预处理方法、特征提取方法和说话人识别模型。详细介绍了LPC、MFCC及它们的倒谱特征的提取过程,并分析它们的特性。除此之外,还介绍了模板匹配算法、隐马尔科夫模型法、矢量量化法、高斯混合模型法、支持向量机法以及深度神经网络法这几种经典的说话人识别方法。通过前期的研究发现,后三种方法在说话人识别系统中表现相对更佳,所以本文对说话人识别的研究也是基于这三种方法上的。(2)针对传统超矢量在说话人辨认系统中性能表现不够好的问题,本文提出了基于重组超矢量构建文本无关的GMM-SVM说话人辨认系统。重组超矢量充分利用各相邻高斯分量的均值矢量的高关联性,并且每个高斯分量的均值矢量携带足够的说话人个性信息。重组超矢量能充分反应说话人身份的内在细节,更使得系统可以充分利用SVM处理高维小数据性能优越的特点。实验结果表明,重组超矢量的GMM-SVM说话人辨认系统与传统的基于GMMSVM的说话人系统相比,有效的提高了说话人的辨别率,同时大幅度缩短了系统建模的时间。(3)针对传统特征参数不能挖掘语音信号深层次结构信息的问题,本文设计了一个深度神经网络来提取说话人语音的瓶颈特征,搭建基于DNN-SVM的说话人辨认系统。这种特征可以挖掘说话人的深度特性,具有不变性和高区分性的特点。实验结果表明,基于DNN-SVM的说话人辨认系统比基于SVM的说话人辨认系统的识别性能有了明显的提高。
[Abstract]:Speech is the most effective way of communication, because of its uniqueness, it becomes the basic basis of speaker recognition technology. Under the basic framework of speaker recognition, it is a hot topic in the field of speaker recognition to find a discriminative speaker personality to achieve higher system performance. Model selection and feature extraction are important issues in speaker recognition technology. Under the condition of model selection, the performance of speaker recognition system is mainly determined by which type of feature parameters are selected. In the digital age, it is of great theoretical and practical significance to find a superior speaker personality. The aim of this paper is to design speech features that can improve the recognition performance of speaker recognition systems or reduce the system time complexity. This paper focuses on the characteristics of GMM Supervector in speaker recognition system, and puts forward the recombination supervector, combining the characteristics of support vector machine, analyzes the feasibility of recombination supervector, and then studies the popular depth learning in recent years. A depth neural network is designed to extract the bottleneck features of speaker speech. The main work and innovation of this paper are as follows: (1) this paper introduces the basic framework of speaker recognition, including speech preprocessing method, feature extraction method and speaker recognition model. The extraction process of LPC-MFCC and its cepstrum features are introduced in detail, and their characteristics are analyzed. In addition, several classical speaker recognition methods, such as template matching algorithm, hidden Markov model method, vector quantization method, Gao Si hybrid model method, support vector machine method and depth neural network method, are also introduced. Through previous studies, it was found that the latter three methods performed better in the speaker recognition system. Therefore, the research of speaker recognition in this paper is also based on the three methods. (2) aiming at the problem that the performance of traditional supervector in speaker recognition system is not good enough, In this paper, a text independent GMM-SVM speaker recognition system based on recombination supervector is proposed. The recombination supervector makes full use of the high correlation of the mean vectors of each adjacent Gao Si component, and the mean vector of each Gao Si component carries sufficient speaker personality information. The recombination supervector can fully reflect the intrinsic details of the speaker's identity and make the system make full use of the superior performance of SVM in dealing with high dimensional and small data. The experimental results show that compared with the traditional speaker recognition system based on GMMSVM, the GMM-SVM speaker recognition system based on recombination supervector can effectively improve the speaker identification rate. At the same time, the time of system modeling is shortened greatly.) aiming at the problem that the traditional feature parameters can not mine the deep structure information of speech signal, a depth neural network is designed to extract the bottleneck feature of speaker speech. The speaker identification system based on DNN-SVM is built. This feature can mine the depth of the speaker and has the characteristics of invariance and high discrimination. The experimental results show that the performance of speaker recognition system based on DNN-SVM is significantly improved than that of speaker recognition system based on SVM.
【学位授予单位】:南京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3;TP18
【参考文献】
相关期刊论文 前7条
1 田W,
本文编号:1917352
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1917352.html