噪声环境下的VAD检测方法的研究

发布时间：2018-01-26 10:57

本文关键词： 语音端点检测信噪比倒谱距离降噪自编码器　出处：《新疆大学》2017年硕士论文　论文类型：学位论文

【摘要】：语音端点检测(Voice Activity Detection,VAD)主要指的是检测一段语音信号的起始点和终止点,以便于分离有效的语音信号和无用的语音信号或者噪音信号,使得后续的处理更加有效率。它应用非常广泛,常用于语音识别系统、语音增强、语音编码等任务上。当前端点检测的研究主要有两个方向,其一就是通过阈值来进行检测,常用的方法有基于短时能量过零率的检测、基于信息熵的检测等。而另一种检测方法是基于模式识别的检测,常用的方法有常见的有基于隐马尔科夫模型(Hidden Markov Model,HMM)的检测、基于支持向量机的检测等。语音端点检测的检测结果好坏对后续的语音处理起着决定性的作用。本文的研究主要是基于噪声环境下的端点检测方法的研究,由于在低信噪比下的传统的检测方法都存在检测率不高的问题,因此本文首先对语音进行预处理,实现语音的有效去噪,然后通过传统的基于倒谱距离的检测方法来进行检测,在降噪的过程中,本文应用了最近几年的研究热点深度学习的知识,提出了将深度学习下的降噪自编码器用于语音去噪,并取得了一定的效果。由于噪声与语音信号之间的复杂关系,并且在我们的生活中,声音常常受到的是加性噪声的影响,因此本文重点研究了语音信号在不同噪声不同信噪比下的检测性能,实验选取了NOISE92噪声库中Factory、volvo以及white三种噪声以及纯净语音库TIMIT中部分语音数据,同时合成不同噪声类型下不同信噪比的带噪语音,在实验中,合成了包含信噪比为-10d B到10d B之间的五种带噪数据,然后通过梯度下降法(gradient descent)来训练降噪自编码器(Denoising Autoencoder,DAE),来实现对加噪之后的语音信号的重构,使其与原始纯净语音信号的误差最小,从而实现降噪的目的,进而通过倒谱距离的检测方法来实现语音端点的检测,从而提高低信噪比下端点检测的正确率。实验结果表明,特别是在低信噪比条件下传统的端点检测方法正确率都是急剧下降,但是将本文提出的方法用于语音端点检测时,其语音信号的检测正确率明显得到提升,尤其是在0dB以下的低信噪比情况下,相比传统的检测算法,其检测正确率更高。
[Abstract]:Voice Activity Detection (VAD) mainly refers to the starting point and the terminating point of detecting a segment of speech signal. In order to separate the effective speech signal from the useless speech signal or noise signal, it makes the subsequent processing more efficient. It is widely used in speech recognition system and speech enhancement. In speech coding and other tasks, the current research on endpoint detection has two main directions, one is to detect by threshold, and the commonly used methods are based on short-time energy zero-crossing rate detection. Another detection method is based on pattern recognition, and the common methods are Hidden Markov Model based on Hidden Markov Model. HMM-). The detection results of speech endpoint detection play a decisive role in the subsequent speech processing based on support vector machine. The research of this paper is mainly based on the noise environment of endpoint detection methods. Because the detection rate is not high in the traditional detection methods under low SNR, the speech is preprocessed in this paper to achieve effective speech denoising. Then the traditional detection method based on cepstrum distance is used to detect. In the process of noise reduction, this paper applies the knowledge of hot spot depth learning in recent years. In this paper, the denoising self-encoder based on deep learning is applied to speech denoising, and some results are obtained. Because of the complex relationship between noise and speech signal, and in our daily life. The sound is often affected by additive noise, so this paper focuses on the detection performance of speech signals under different noise and different SNR. The Factory in the NOISE92 noise database is selected in the experiment. Three kinds of noises, volvo and white, as well as some speech data in pure speech corpus TIMIT, are synthesized with different signal-to-noise ratio (SNR) under different noise types. Five kinds of noisy data are synthesized with signal-to-noise ratio (SNR) of -10dB to 10dB. Then the noise reduction self-encoder is trained by gradient descent method (Denoising Autocoderdai). To realize the reconstruction of the speech signal after adding noise, so that the error between the speech signal and the original pure speech signal is minimized, so as to achieve the purpose of noise reduction, and then to realize the detection of speech endpoints through the detection method of cepstrum distance. The experimental results show that the accuracy of the traditional endpoint detection methods is decreased sharply, especially in the low SNR condition. However, when the proposed method is used in speech endpoint detection, the detection accuracy of speech signal is obviously improved, especially in the case of low SNR below 0 dB, compared with the traditional detection algorithm. The correct rate of detection is higher.
【学位授予单位】：新疆大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3

【相似文献】