基于统计模型的语音端点检测
发布时间:2018-04-13 02:29
本文选题:语音端点检测 + 能量聚类 ; 参考:《上海师范大学》2017年硕士论文
【摘要】:语音端点检测的目的是检测出语音信号中的语音与非语音片段。在很多先进的语音处理应用的前端处理部分,比如语音识别,声纹识别和语音传输,语音端点检测都是重要的步骤。在所有语音端点检测系统中,基于能量的语音端点检测最常被使用。基于能量的语音端点检测在无噪声环境下性能较好,但是在噪声环境下性能下降较多。自适应语音端点检测与传统的基于能量的语音端点检测相比,具有很多方面的优势。然而,自适应语音端点检测中,唯一的最低能量门限不能适应不同的噪声背景。本文的第一个研究内容,提出了一种方法改进这个问题,一种基于k-means的平均能量聚类方法,可以为每个语音找到更适合的最低能量门限。此外,实验中还使用了中值滤波,以平滑短时噪音产生的干扰。在NIST SRE2006说话人测评(SRE)数据上的实验表明,我们提出的方法比传统基于能量的VAD和自适应VAD均能获得更好的性能。基于深度神经网络的语音端点检测方法由于性能显著优于其他方法,成为近期的研究焦点。本文的第二个研究内容,以一种基于深度神经网络的语音端点检测方法为基础,针对其在低信噪比环境中表现不佳的问题和易受短时噪音干扰的问题,分别使用了谱减法语音增强和自适应中值滤波的方法做了改进。另外,本实验提出一种监督学习规则,类比于人类学习先易后难的原则对神经网络进行训练,显著加快了神经网络的收敛速度。在AURORA2数据库上的实验结果表明,相比于基线系统,改进后的方法不仅加速了训练速度,而且还取得了31.12%的相对性能提升。
[Abstract]:Speech endpoint detection is to detect the speech signal in speech and non speech segments. In many advanced voice processing application front-end processing, such as speech recognition, voice recognition and voice transmission, speech endpoint detection is an important step in all speech endpoint detection system, speech endpoint detection is often the most energy by using the energy based endpoint. Better detection performance in noise environment based on performance, but in the noise environment decreased more. The traditional speech endpoint detection and adaptive speech endpoint detection based on energy ratio, has many advantages. However, adaptive speech endpoint detection, can not only meet the minimum energy threshold noise is different. The first research content, proposes a method to improve this problem, an average energy clustering method based on k-means, You can find the lowest energy threshold is more suitable for each speech. In addition, the median filter is used to smooth short-term interference noise. In NIST SRE2006 (SRE) speaker evaluation indicates that the data on the experiment, we propose a method based on energy performance than the traditional VAD and adaptive VAD can get better. Speech endpoint detection method based on neural network depth due to performance significantly better than other methods, has become the focus of research in recent years. Second the research content of this paper, in a speech endpoint detection method based on deep neural network based on the low SNR environment of poor performance and vulnerable to short-term noise interference the problem, using method of spectral subtraction speech enhancement and adaptive median filtering is improved. In addition, this study proposes a supervised learning rules, analogous to the human learning first The principle of easy and difficult to train the neural network significantly accelerates the convergence speed of neural network. Experimental results on AURORA2 database show that compared with baseline system, the improved method not only speeds up the training speed, but also achieves a relative performance improvement of 31.12%.
【学位授予单位】:上海师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.3
【参考文献】
相关期刊论文 前4条
1 冯璐;陈威兵;吴宇;;基于语音拖音段的端点检测算法研究[J];计算机工程与科学;2012年10期
2 刘华平;李昕;徐柏龄;姜宁;;语音信号端点检测方法综述及展望[J];计算机应用研究;2008年08期
3 王书诏;邱天爽;;说话人识别研究综述[J];电声技术;2007年01期
4 王让定,柴佩琪;一个基于谱熵的语音端点检测改进方法[J];信息与控制;2004年01期
相关会议论文 前1条
1 贾川;张健;陈振标;徐波;;噪声环境下的端点检测算法研究[A];第六届全国人机语音通讯学术会议(NCMMSC6)论文集[C];2001年
相关硕士学位论文 前1条
1 周雷;基于声纹识别的说话人身份确认方法的研究[D];上海师范大学;2016年
,本文编号:1742554
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/1742554.html