复杂噪声环境下语音端点的检测算法的研究

发布时间：2018-05-07 23:02

本文选题：语音端点检测 + 语音增强　；参考：《东华大学》2016年硕士论文

【摘要】：语音端点检测是语音分析、语音合成和语音识别中的一个必要环节。尽管语音端点检测技术在安静的环境中已经达到了令人鼓舞的准确率,但是在实际应用时由于噪声的引入和环境的改变通常会使系统性能显著下降。语音端点检测技术要走向实用,就必须克服鲁棒性问题,因此低信噪比噪声环境下的语音端点检测技术的意义非常重要。本文以应用型语音端点检测技术为目标,以系统鲁棒性为研究重点,对噪声环境下的孤立词和连续语句的端点检测的各个方面都做了深入的研究。通过对鲁棒性语音端点检测的系统研究和实验,本文形成了一套完整的复杂噪声环境下的语音端点的检测研究体系,包括了语音数据库的建立、自适应滤波算法、基于分类标准的延迟分割策略等,并在此基础上构建了一套完整的语音端点检测系统。其具体的研究成果包括以下几个方面:⑴端点检测实验系统深入研究语音信号数学模型和不同语音信号的特征值及提取,收集到了TIMIT标准纯净语音库和NOISEX-92标准噪声库,给出了噪声的度量标准,建立了混噪语音平台,保证了后期实验的可重复性。⑵语音增强算法针对常规的自适应滤波算法在收敛速度和稳定精度以及计算复杂度上不可协调性,引入了欧式搜索算法,对算法做了多处改进,降低计算精度,大大改善了收敛速度和稳定性,经过对比实验验证其性能接近RLS算法,而其计算量却小很多。在MOS和SNR评价方法中,也获得了较高的表现。⑶端点检测算法详细地分析了常用的双门限端点检测方法、基于谱熵的端点检测算法和基于分形理论的端点检测算法。引入了排列熵,一种作为非线性动力学参数能够很好的表示出语音信号的非线性特征。提出了一种延迟分割策略:以能频比为特征参数确定粗端点,并在此基础上使用排列熵差分算法确定精确端点,以精确端点为起始点分割语音信号,对所得到的语音片段信号按照分类标准消除噪声信号带来的错误分割。⑷系统实现利用Matlab GUI工具实现整个端点检测系统界面,利用第二章的语音数据库开展端点检测不同方法的对比试验。实验表明文中提出的方法比基于常规的双门限、谱熵的方法有更好的检测效果,特别是在低信噪比的情况下,基本能达到基于分形的方法效果。但是加上滤波效果后文中的方案的效果远超其他方法。同时由于排列熵算法的简单易实现,算法的实时性表现非常好,其计算的复杂度远小于分形方法。
[Abstract]:Speech endpoint detection is a necessary link in speech analysis, speech synthesis and speech recognition. Although the speech endpoint detection technology has achieved encouraging accuracy in quiet environments, the performance of the system usually decreases significantly due to the introduction of noise and the change of environment in practical applications. If the speech endpoint detection technology is to be practical, it must overcome the problem of robustness, so the significance of speech endpoint detection in low SNR noise environment is very important. In this paper, the application of speech endpoint detection technology as the target, system robustness as the focus of the study, isolated words and continuous sentences in noisy environment of all aspects of endpoint detection have been deeply studied. Through the systematic research and experiment of robust speech endpoint detection, a complete research system of speech endpoint detection in complex noise environment is formed, including the establishment of speech database and adaptive filtering algorithm. A complete speech endpoint detection system is constructed on the basis of delay segmentation strategy based on classification standard. The concrete research results include the following aspects: 1: 1 Endpoint Detection experiment system deeply studies the mathematical model of speech signal and the characteristic value and extraction of different speech signal, collects the pure speech database of TIMIT standard and the noise database of NOISEX-92 standard. In this paper, the noise measurement criterion is given, and the speech mixing platform is established, which ensures that the repeatable .2 speech enhancement algorithm in the later experiment can not coordinate the convergence speed, stability accuracy and computational complexity of the conventional adaptive filtering algorithm in view of the convergence rate, stability accuracy and computational complexity of the conventional adaptive filtering algorithm. The Euclidean search algorithm is introduced. The algorithm is improved in many ways to reduce the accuracy and improve the convergence speed and stability. The performance of the Euclidean search algorithm is close to that of the RLS algorithm, but the computational complexity is much smaller than that of the Euclidean search algorithm. Among the evaluation methods of MOS and SNR, a high performance 3 endpoint detection algorithm is obtained. The commonly used double threshold endpoint detection methods, the spectral entropy based endpoint detection algorithm and the fractal theory based endpoint detection algorithm are analyzed in detail. The permutation entropy is introduced. As a nonlinear dynamic parameter, the nonlinear characteristic of speech signal can be well expressed. In this paper, a delay segmentation strategy is proposed, in which the coarse endpoints are determined with the energy / frequency ratio as the characteristic parameters, and the accurate endpoints are determined by the permutation entropy difference algorithm, and the speech signals are segmented with the precise endpoints as the starting point. Using Matlab GUI tool to realize the interface of the whole endpoint detection system, the error segmentation of the speech segment signal is eliminated according to the classification standard. The comparison experiment of different methods of endpoint detection is carried out by using the voice database in Chapter 2. Experiments show that the proposed method is more effective than the conventional two-threshold method, especially in the case of low signal-to-noise ratio (SNR). But the effect of the scheme after the filtering effect is far higher than other methods. Because the permutation entropy algorithm is simple and easy to realize, the real-time performance of the algorithm is very good, and the complexity of the algorithm is much less than that of fractal method.
【学位授予单位】：东华大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TN912.3

【相似文献】