语音端点检测算法的研究及应用
发布时间:2018-08-23 19:51
【摘要】:语音端点检测(也称语音活动检测,Voice Activity Detection VAD)是指从混有噪声的语音信号中检测语音信息的存在与否。语音端点检测通常用于语音编码、语音增强等语音处理系统中,起到了降低语音编码速率、占用较少通信带宽、提高了移动设备使用效率、准确识别语音信息等作用。在语音信号分析中,首先要求对系统输入的含噪音信号进行判断,准确地找出信号中有用的信息段,减少信号处理的数据量,提高语音处理效率。传统的双门限法语音端点检测算法在无噪声污染的环境中检测准确度较高,但在实际的噪声环境中,尤其是低信噪比条件下,端点检测正确率较低。本文以不同语者性别信息为前提,对小波能量熵端点检测算法进行改进。实验数据表明,改进的小波能量熵算法有效的提高了端点检测的准确率。本文研究的主要内容和成果如下:1.本文提出了一种基于语音属性统计量的语音信号分析方法。已有的语音分析方法主要关注语音短时能量、短时过零率、基音周期、共振峰频率、Mel倒谱系数等特征,本文根据不同语者发音特性从短时能量方差、Mel倒谱距离方差、MFCC倒谱距离方差属性等方面进行多维度的语音信号分析。对从语音信号中提取到的239维数据,运用Relief[1]特征选择算法进行降维,建立合理的特征集合。实验表明,引入语音属性统计量后,语音信息识别准确率得到明显的提高。2.根据不同性别语者发音特性,引入模糊隶属度函数的概念,对语音信号的语者性别信息进行检测。由不同性别语者的基音频率变化曲线,建立了模糊隶属度函数模型,此模型可以对语者性别信息做出初步的判别。在分析语者性别模糊隶属度的基础上,对于不能准确识别语者性别信息的语音文件进一步采用决策树模型进行识别。实验表明,在低信噪比条件下,该混合模型对语者性别信息的识别有较大改进,识别效果较好。3.在准确识别语者性别信息的前提下,本文分析了小波算法和小波能量熵算法在语音端点检测应用中的优点与不足之处,并对小波能量熵算法从运算准确率方面进行了改进。最后,通过仿真实验运用改进的小波能量熵算法对含噪声的语音文件进行了测试与分析。实验数据表明,在不同噪声背景、信噪比为5db时,该算法能准确的检测出语音段和非语音段,显著地降低了信息丢失量,准确率有较大提高。
[Abstract]:Voice Endpoint Detection (also known as Voice activity Detection Voice Activity Detection VAD) is used to detect the presence or absence of speech information from noisy speech signals. Speech endpoint detection is usually used in speech coding, speech enhancement and other speech processing systems, which can reduce the speech coding rate, occupy less communication bandwidth, improve the efficiency of mobile devices, and accurately recognize speech information. In the analysis of speech signal, it is necessary to judge the noisy signal input in the system, find out the useful information segment of the signal accurately, reduce the data amount of signal processing, and improve the efficiency of speech processing. The traditional dual-threshold speech endpoint detection algorithm has a high accuracy in a noise-free environment, but the accuracy of endpoint detection is low in the actual noise environment, especially in the low SNR environment. In this paper, the wavelet energy entropy endpoint detection algorithm is improved on the premise of gender information of different speakers. Experimental data show that the improved wavelet energy entropy algorithm can effectively improve the accuracy of endpoint detection. The main contents and results of this paper are as follows: 1. This paper presents a speech signal analysis method based on speech attribute statistics. The existing speech analysis methods mainly focus on the characteristics of speech short time energy, short time zero crossing rate, pitch period, resonance peak frequency and Mel cepstrum coefficient, etc. Based on the pronunciation characteristics of different speakers, this paper analyzes multi-dimensional speech signals from the aspects of short-term energy variance and Mel Cepstrum distance variance and MFCC Cepstrum distance variance attribute. For the 239-dimensional data extracted from speech signal, the Relief [1] feature selection algorithm is used to reduce the dimension and establish a reasonable feature set. The experimental results show that the accuracy of speech information recognition is obviously improved by introducing speech attribute statistics. According to the pronunciation characteristics of different gender speakers, the concept of fuzzy membership function is introduced to detect the speaker's gender information of speech signal. Based on the pitch frequency curve of different gender speakers, a fuzzy membership function model is established, which can be used to judge the gender information of the speaker. On the basis of analyzing the fuzzy membership degree of the speaker's gender, the decision tree model is used to recognize the speech file which can not accurately recognize the speaker's gender information. The experimental results show that the hybrid model can improve the recognition of speaker's gender information under the condition of low signal-to-noise ratio (SNR), and the recognition effect is better. 3. On the premise of accurately recognizing the speaker's gender information, this paper analyzes the advantages and disadvantages of wavelet algorithm and wavelet energy entropy algorithm in the application of speech endpoint detection, and improves the accuracy of wavelet energy entropy algorithm. Finally, an improved wavelet energy entropy algorithm is used to test and analyze the noisy speech files. Experimental data show that the algorithm can accurately detect the speech segment and the non-speech segment under different noise background and SNR of 5db, which can significantly reduce the amount of information loss and improve the accuracy of the algorithm.
【学位授予单位】:西安建筑科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TN912.3
,
本文编号:2199738
[Abstract]:Voice Endpoint Detection (also known as Voice activity Detection Voice Activity Detection VAD) is used to detect the presence or absence of speech information from noisy speech signals. Speech endpoint detection is usually used in speech coding, speech enhancement and other speech processing systems, which can reduce the speech coding rate, occupy less communication bandwidth, improve the efficiency of mobile devices, and accurately recognize speech information. In the analysis of speech signal, it is necessary to judge the noisy signal input in the system, find out the useful information segment of the signal accurately, reduce the data amount of signal processing, and improve the efficiency of speech processing. The traditional dual-threshold speech endpoint detection algorithm has a high accuracy in a noise-free environment, but the accuracy of endpoint detection is low in the actual noise environment, especially in the low SNR environment. In this paper, the wavelet energy entropy endpoint detection algorithm is improved on the premise of gender information of different speakers. Experimental data show that the improved wavelet energy entropy algorithm can effectively improve the accuracy of endpoint detection. The main contents and results of this paper are as follows: 1. This paper presents a speech signal analysis method based on speech attribute statistics. The existing speech analysis methods mainly focus on the characteristics of speech short time energy, short time zero crossing rate, pitch period, resonance peak frequency and Mel cepstrum coefficient, etc. Based on the pronunciation characteristics of different speakers, this paper analyzes multi-dimensional speech signals from the aspects of short-term energy variance and Mel Cepstrum distance variance and MFCC Cepstrum distance variance attribute. For the 239-dimensional data extracted from speech signal, the Relief [1] feature selection algorithm is used to reduce the dimension and establish a reasonable feature set. The experimental results show that the accuracy of speech information recognition is obviously improved by introducing speech attribute statistics. According to the pronunciation characteristics of different gender speakers, the concept of fuzzy membership function is introduced to detect the speaker's gender information of speech signal. Based on the pitch frequency curve of different gender speakers, a fuzzy membership function model is established, which can be used to judge the gender information of the speaker. On the basis of analyzing the fuzzy membership degree of the speaker's gender, the decision tree model is used to recognize the speech file which can not accurately recognize the speaker's gender information. The experimental results show that the hybrid model can improve the recognition of speaker's gender information under the condition of low signal-to-noise ratio (SNR), and the recognition effect is better. 3. On the premise of accurately recognizing the speaker's gender information, this paper analyzes the advantages and disadvantages of wavelet algorithm and wavelet energy entropy algorithm in the application of speech endpoint detection, and improves the accuracy of wavelet energy entropy algorithm. Finally, an improved wavelet energy entropy algorithm is used to test and analyze the noisy speech files. Experimental data show that the algorithm can accurately detect the speech segment and the non-speech segment under different noise background and SNR of 5db, which can significantly reduce the amount of information loss and improve the accuracy of the algorithm.
【学位授予单位】:西安建筑科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TN912.3
,
本文编号:2199738
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2199738.html