当前位置:主页 > 科技论文 > 网络通信论文 >

I-VECTOR说话人识别中基于偏最小二乘的总变化空间估计方法

发布时间:2018-04-24 11:46

  本文选题:说话人识别 + i-vector ; 参考:《哈尔滨工业大学》2015年硕士论文


【摘要】:作为一项关键的多媒体数据分析技术,说话人识别被广泛地应用于事务访问控制、身份验证、执法、语音数据管理,以及音频监控等领域。其中,i-vector作为一项有效的说话人识别技术,其性能优于传统的说话人识别方法,因而在说话人识别领域受到了广泛的关注。I-vector说话人识别技术的核心环节为总变化空间的估计,然而目前的总变化空间的估计方法均为通过寻找特征向量之间的数据信息关系达来到特征提取的目的,却忽略了一个重要的先验知识——说话人的类别信息,而类别信息对于样本的分类与预测有着十分重要的意义,因此现有的总变化空间估计方法并不是最优的。为此,本文从类别信息入手,提出了一种基于偏最小二乘的总变化空间的估计方法。首先训练高斯混合模型-通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM),从而得到每位说话人GMM均值超向量;然后利用GMM均值超向量和类别信息估计总变化空间,并提取说话人i-vector;最后利用类内协方差规整(Within-Class Covariance Normalization,WCCN)进行信道补偿处理,并用余弦距离打分作为判决方法。实验结果表明,King-ASR-009数据库与NIST 2008数据库(任务short2-short3与任务8conv-short3)上的识别性能均有明显提升。由于偏最小二乘对类间相似点不敏感,而对异常点比较敏感,所以当训练样本中出现上述问题时,往往导致系统性能的下降。对此,本文提出了一种基于回归惩罚偏最小二乘的总变化空间估计方法,将训练语料一分为二,一部分用于训练初始总变化空间,另一部分用于回归惩罚。实验结果表明,King-ASR-009数据库上的说话人确认与辨认性能均有所提升。
[Abstract]:As a key technology of multimedia data analysis, speaker identification is widely used in the fields of transaction access control, authentication, law enforcement, voice data management, audio monitoring and so on. As an effective speaker recognition technology, the performance of i-vector is superior to that of the traditional speaker recognition method, so it has received widespread attention in the field of speaker recognition. The core link of .I-vector speaker recognition technology is the estimation of total change space. However, the current estimation methods of the total change space all reach the purpose of feature extraction by searching for the data information relationship between the feature vectors, but ignore an important priori knowledge-the category information of the speaker. Class information is very important for the classification and prediction of samples, so the existing estimation methods of total variation space are not optimal. For this reason, this paper presents an estimation method of total change space based on partial least squares. Firstly, Gao Si mixed model-general background model is trained to obtain the GMM mean supervector of each speaker, and then the total change space is estimated by using GMM mean supervector and category information. Finally, we use Within-Class Covariance NormalizationWCCNs to deal with channel compensation, and use cosine distance as the judgment method. The experimental results show that the recognition performance of King-ASR-009 database and NIST 2008 database (task short2-short3 and task 8conv-short3) are improved obviously. Since partial least squares is insensitive to similar points between classes and sensitive to outliers, the system performance is often degraded when the above problems occur in the training samples. In this paper, a method of estimating the total variable space based on partial least squares of regression penalty is proposed. The training corpus is divided into two parts, one part is used to train the initial total change space, the other part is used for regression punishment. The experimental results show that the speaker recognition and identification performance in King-ASR-009 database is improved.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TN912.34

【参考文献】

相关硕士学位论文 前1条

1 王秋雯;基于GMM-UBM的快速说话人识别方法[D];哈尔滨工业大学;2011年



本文编号:1796491

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/wltx/1796491.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a1bb6***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com