语音识别技术在人机交互中的应用研究
本文选题:梅尔频率倒谱系数 + 谱减法 ; 参考:《北方工业大学》2017年硕士论文
【摘要】:本文在探讨了语音识别技术在人机交互中的前景与地位后,深入研究了语音识别技术的相关知识。首先,论文研究了语音识别的主要技术环节,并针对其中的端点检测环节提出了基于MFCC参数余弦值的单门限端点检测方法。其次,为提升特征参数的质量,本文对相关的语音增强算法进行了研究,提出了融合LMS算法与谱减法的语音增强方法。最后,本文提出了融合上述两个算法的一种语音识别方法,并且不仅在实验条件下验证了这种方法的可靠性,同时应用这种方法实现了一款以语音为媒介的交互软件。本文的主要内容及创新点如下:1)端点检测是语音识别环节中最重要的环节,本文在研究了大量的端点检测算法后,在基于MFCC欧氏距离的双门限端点检测算法的基础之上,提出了一种基于MFCC参数余弦值的单门限端点检测算法——MFCC_COS算法。该算法通过计算MFCC参数的余弦值来区分语音段与非语音段,并利用单门限加以判断。该算法避免了欧氏距离中存在的数值敏感问题、降低了双门限值导致误差变大的概率。该算法实现简单,在噪声环境中表现良好,且伴随着噪音强度的增加,检测的准确率也不会降低过快。较传统的算法相比,显然具有更好的鲁棒性。2)现实环境中的噪音是无法避免的,在提取特征参数之前,本文对语音文件进行了语音增强。目的主要是为了在特征提取环节得到质量更高的特征参数。在研究了大量的语音增强方法后,提出了融合谱减法与LMS算法的语音增强方法——LMSSS算法。算法消除了谱减法降噪后存在音乐噪声的问题,也规避了LMS算法的滤波器延迟问题。实验表明该方法的去燥效果比单独使用谱减法或单独使用LMS算法都要有所提升,且在实验范围内,噪音强度越强,去燥效果的优势越明显。3)本文在最后部分提出了结合LMSSS算法与MFCC_COS算法的语音识别方法,实验表明语音识别的准确率在融合了 LMSSS算法后得到了进一步的提升,且在噪声环境中表现出了较好的鲁棒性。另外,本文基于这种语音识别方法实现了一款语音识别交互软件。
[Abstract]:After discussing the prospect and status of speech recognition technology in human-computer interaction, this paper deeply studies the related knowledge of speech recognition technology. Firstly, the main technology of speech recognition is studied, and a single threshold endpoint detection method based on the cosine value of MFCC parameters is proposed for the endpoint detection. Secondly, in order to improve the quality of feature parameters, the related speech enhancement algorithms are studied in this paper, and a speech enhancement method combining LMS algorithm and spectral subtraction is proposed. Finally, a speech recognition method based on the above two algorithms is proposed, which not only verifies the reliability of the method under experimental conditions, but also implements an interactive software based on speech. The main contents and innovations of this paper are as follows: (1) Endpoint detection is the most important link in speech recognition. After studying a large number of endpoint detection algorithms, this paper proposes a two-threshold endpoint detection algorithm based on MFCC Euclidean distance. A single threshold endpoint detection algorithm based on MFCC parameters cosine value is proposed. The algorithm distinguishes speech segment from non-speech segment by calculating the cosine value of MFCC parameter, and uses single threshold to judge. The algorithm avoids the problem of numerical sensitivity in Euclidean distance and reduces the probability that the error will increase due to the double threshold. The algorithm is simple to implement, performs well in noise environment, and with the increase of noise intensity, the detection accuracy will not be reduced too fast. Compared with the traditional algorithm, it is obvious that the noise in the real environment is unavoidable. Before the feature parameters are extracted, the speech file is enhanced in this paper. The aim is to obtain higher quality feature parameters in feature extraction. After studying a large number of speech enhancement methods, a new speech enhancement method, LMSSS, is proposed, which combines spectral subtraction and LMS algorithm. The algorithm eliminates the problem of music noise after noise reduction by spectral subtraction and avoids the filter delay problem of LMS algorithm. The experimental results show that the effect of this method is better than that of using spectral subtraction alone or LMS algorithm alone, and the stronger the noise intensity is in the range of experiment, In the last part of this paper, a speech recognition method combining LMSSS algorithm with MFCC Cats algorithm is proposed. The experimental results show that the accuracy of speech recognition has been further improved after the integration of LMSSS algorithm. And it shows good robustness in noise environment. In addition, this paper implements a speech recognition interactive software based on this speech recognition method.
【学位授予单位】:北方工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.34
【参考文献】
相关期刊论文 前10条
1 张君昌;张丹;崔力;;一种鲁棒自适应阈值的语音端点检测方法[J];西安电子科技大学学报;2015年05期
2 ;LPC parameters substitution for speech information hiding[J];The Journal of China Universities of Posts and Telecommunications;2009年06期
3 欧世峰,赵晓晖,顾海军;改进的基于信号子空间的多通道语音增强算法[J];电子学报;2005年10期
4 董志峰;汪增福;;基于动态MFCC的说话人识别算法[J];模式识别与人工智能;2005年05期
5 张仁志,崔慧娟;基于短时能量的语音端点检测算法研究[J];电声技术;2005年07期
6 栗学丽,丁慧,徐柏龄;基于熵函数的耳语音声韵分割法[J];声学学报;2005年01期
7 王作英,肖熙;基于段长分布的HMM语音识别模型[J];电子学报;2004年01期
8 王让定;柴佩琪;;一种基于改进谱减法的语音增强方法[J];模式识别与人工智能;2003年02期
9 邵央,刘丙哲,李宗葛;基于MFCC和加权矢量量化的说话人识别系统[J];计算机工程与应用;2002年05期
10 韩纪庆,王承发,吕成国,张磊,任为民,马永林;噪声环境下顽健的语音识别系统[J];电声技术;2002年01期
相关硕士学位论文 前1条
1 金学骥;语音增强算法的研究与实现[D];浙江大学;2005年
,本文编号:2118444
本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2118444.html