噪声环境下基于稀疏表示的说话人识别系统的研究

发布时间：2018-09-01 11:23

【摘要】：说话人识别作为一种声纹识别技术,在模式识别应用发展迅猛的今天有着无可限量的前景,它与其他利用个人生物特征的识别方式相比,有着操作便捷、设备低廉的优势,因此近几年来,对说话人识别的研究引起了人们的广泛关注。目前,说话人识别常用的模型是高斯背景混合模型,该模型是根据通用背景模型训练得到,相比较于其它模型鲁棒性较好,但是其计算量大,识别效果也不尽如人意,随后很多人在此模型的基础上进行了改进。近年来,稀疏表示算法在信号处理领域有着惊人的表现,而且已经在图像的识别、处理、分离等方面取得了很好的处理效果。除此之外,还可以将稀疏表示作为一种分类算法引入匹配识别模块,对说话人识别系统进行改进,并希望通过稀疏表示的特性来解决说话人识别系统中一遇到噪声干扰,识别效率就会急剧下降的问题。论文的主要工作包括:首先,将稀疏表示的算法引入说话人识别模型中,利用稀疏表示的分类特性对模型的匹配识别方法进行了改进,通过计算最小的标准重构误差来找到对应的说话人。其次,为了满足稀疏表示算法的要求我们对字典的组成进行设计,使用目前最主流的GMM均值超向量作为字典原子。针对超向量维度较大的问题,提出了利用Fisher判别比来对字典每一维的分类性能进行比较,并制定规则来控制字典降低的维度,同时通过在字典中添加单位矩阵I提高系统的抗噪性能,通过仿真证明了将稀疏表示融入说话模型中可以得到更好的识别效率,以及本文提出的I-Fisher算法既能减小字典的维度,也能提高系统的识别与抗噪性能。这种识别模型非常适用于测试语音与训练语音是在相同环境中录制的,即两种语音的噪声环境相同,在这种条件下识别效果很好,但是如果想满足各种噪声环境下的要求就需要训练多个字典,计算量较大。接下来,针对不同的噪声环境下识别率下降的问题,提出了基于稀疏表示的一种新的字典构建办法来解决噪声的影响。根据MCA形态成分分析法的原理,使用纯净语音来训练说话人字典,通过添加噪声字典的方法可以将求得的稀疏表示系数分离成纯净语音系数部分与噪声系数部分,对纯净语音的系数部分计算重构误差从而排除噪声的影响来进行识别。为了得到能够满足设计要求的字典,我们使用K-SVD字典学习方法分别训练两种字典并进行拼接,将噪声字典作为说话人字典的一部分融入大字典中一起进行稀疏表示分解求取系数。提出通过对含有噪声的测试语音进行相同的分解办法,提取重构出测试语音所含的噪声来更新噪声字典。还通过仿真证明了本文算法能够在测试语音与训练语音在环境噪声不同的情况下有效地减少噪声对识别率的影响。本文主要提出了噪声环境下基于稀疏表示的两种识别模型,对第一种字典的原子构建方式进行了改进优化并通过实验测试出其适用的识别环境,提出了基于噪声字典的第二种字典设计方案并对噪声字典进行了更新,两种方法都取得了良好的识别效果
[Abstract]:Speaker recognition, as a kind of voiceprint recognition technology, has an unlimited prospect in the rapid development of pattern recognition application today. Compared with other recognition methods using personal biological characteristics, it has the advantages of convenient operation and low equipment. Therefore, in recent years, the research on speaker recognition has attracted wide attention. The common model of speaker recognition is Gaussian background mixed model, which is trained according to the general background model. Compared with other models, this model has better robustness, but its computational complexity is large and recognition effect is not satisfactory. Many people have improved on this model. In recent years, sparse representation algorithm is used in signal processing. In addition, sparse representation can be introduced as a classification algorithm into the matching recognition module to improve the speaker recognition system and hope to solve the speaker recognition system through the characteristics of sparse representation. The main work of this paper includes: Firstly, the sparse representation algorithm is introduced into the speaker recognition model, and the matching recognition method of the model is improved by using the classification characteristics of sparse representation. Secondly, in order to satisfy the requirement of sparse representation algorithm, we design the composition of dictionary and use GMM mean hypervector as dictionary atom. Aiming at the problem of large dimension of hypervector, we propose to use Fisher discriminant ratio to compare the classification performance of each dimension of dictionary, and make rules to control dictionary. At the same time, the unit matrix I is added to the dictionary to improve the anti-noise performance of the system. The simulation results show that the sparse representation can be incorporated into the speech model to achieve better recognition efficiency. The I-Fisher algorithm proposed in this paper can not only reduce the dimension of the dictionary, but also improve the recognition and anti-noise performance of the system. T-type is very suitable for testing and training speech in the same environment, that is, the two voices are recorded in the same noise environment, in this condition the recognition effect is very good, but if you want to meet the requirements of various noise environments, you need to train more than one dictionary, the calculation is large. Next, for different noise environments, the recognition rate. According to the principle of MCA morphological component analysis, the speaker dictionary is trained with pure speech, and the sparse representation coefficients can be separated into pure speech coefficients and noise coefficients by adding noise dictionary. In order to get a dictionary that meets the design requirements, we use K-SVD dictionary learning method to train and stitch the two dictionaries separately, and integrate the noise dictionary as part of the speaker dictionary into the large dictionary. Sparse representation decomposition is used to extract and reconstruct the noise contained in the test speech to update the noise dictionary. Simulation results show that the proposed algorithm can effectively reduce the noise pairing between the test speech and the training speech under different ambient noises. In this paper, two recognition models based on sparse representation in noisy environments are proposed, the first dictionary is improved and optimized, and the suitable recognition environment is tested by experiments. The second dictionary design scheme based on noise dictionary is proposed, and the noise dictionary is updated. The method has achieved good recognition effect.
【学位授予单位】：兰州交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3

【参考文献】