复杂背景下声纹特征提取与识别

发布时间：2018-11-02 09:31

【摘要】：随着互联网以及信息化的迅速发展,声纹识别技术在金融、证券、社保、电子商务、银行等远程客户服务的身份确认和公安、军队安全部门的特定人身份自动检测和认证中具有广泛的应用价值和前景需求,是当今世界声音信号处理和生物特征信息检测与识别领域的重要探索方向。近几十年来,在这一领域的研究已经取得了重大进展,但因为说话人个性特征易受外界因素影响以及具体实际环境的复杂多变性,其瓶颈效应也逐渐凸显,因此,在复杂背景下研究有效的语音信息检测方法和更具鲁棒性的特征提取算法对于提高系统的识别率具有非常重要的意义。复杂背景下的声纹识别技术是在高度复杂噪声情况下,通过检测出声音并进一步进行特征提取后,经过分析处理建立识别模型,最后应用识别模型对说话人进行识别。论文主要研究语音端点检测方法和特征提取方法来提高识别效率,主要工作如下。首先,在声音预处理阶段,提出了嘈杂环境下的两种语音信号端点检测方法,根据不同背景复杂程度的信噪比高低分别采用基于谱熵的端点检测算法和基于短时能量和过零率的双门限端点检测算法,实验表明,背景为高信噪比情况下基于短时能量和过零率的双门限端点检测算法效果较好,背景为低信噪比情况下基于谱熵的端点检测算法较优。其次,在特征提取阶段,利用倒谱法计算出基音周期参数,再通过Mel滤波器组将语音信号功率谱转换成Mel倒谱系数(MFCC),然后利用改进特征提取算法将两种参数组成一种声纹特征参量,同时分别对它们进行了实验仿真。最后,在声纹识别阶段,首先提出带噪特征的识别算法(SEMG)算法,即在复杂背景下对语音信号利用基于谱熵的端点检测算法检测端点后,再利用改进特征提取算法特征提取,最后为每个说话人建立一个高斯混合模型(GMM),并通过实验验证了SEMG算法的有效性,达到了理想结果。
[Abstract]:With the rapid development of the Internet and information technology, voiceprint identification technology in finance, securities, social security, e-commerce, banking and other remote customer service identification and public security, The automatic detection and authentication of the specific identity of the military security department has a wide range of application value and foreground requirements. It is an important exploration direction in the field of sound signal processing and biometric information detection and recognition in the world today. In recent decades, great progress has been made in the research in this field. However, because the speaker's personality is easily influenced by the external factors and the complex variability of the actual environment, the bottleneck effect is becoming more and more prominent. It is very important to study the effective speech information detection method and the more robust feature extraction algorithm in complex background for improving the recognition rate of the system. The voiceprint recognition technology in complex background is based on the detection of sound and further feature extraction. After analyzing and processing, the recognition model is established. Finally, the recognition model is used to recognize the speaker. This paper mainly studies the speech endpoint detection method and feature extraction method to improve the recognition efficiency, the main work is as follows. Firstly, in the stage of sound preprocessing, two speech signal endpoint detection methods in noisy environment are proposed. According to the signal-to-noise ratio of different background complexity, the two threshold endpoint detection algorithms based on spectral entropy and short-time energy and zero-crossing rate are used, respectively. The experimental results show that, The dual-threshold endpoint detection algorithm based on short-time energy and zero-crossing rate is better in the case of high signal-to-noise ratio (SNR), and the algorithm based on spectral entropy is better when the background is low SNR. Secondly, in feature extraction stage, pitch period parameters are calculated by cepstrum method, and then the power spectrum of speech signal is converted to Mel cepstrum coefficient (MFCC), by Mel filter bank. Then, the improved feature extraction algorithm is used to make two parameters into one voiceprint feature parameter, and at the same time, the experimental simulation of them is carried out. Finally, in the stage of voiceprint recognition, a noisy feature recognition algorithm (SEMG) is proposed, that is, the speech signal is detected by spectral entropy based endpoint detection algorithm under complex background, and then the improved feature extraction algorithm is used to extract features. Finally, a Gao Si hybrid model, (GMM), is established for each speaker, and the effectiveness of the SEMG algorithm is verified by experiments, and the ideal results are obtained.
【学位授予单位】：中南林业科技大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.34

【参考文献】