当前位置:主页 > 科技论文 > 信息工程论文 >

基于HMM和DNN的语音识别算法研究与实现

发布时间:2018-11-11 13:25
【摘要】:在过去的2016年,人工智能、虚拟现实、可穿戴设备等已成为科技行业研究的前沿和热点,这些研究都不可避免的需要人与计算机进行交互,语音比键盘鼠标的交互方式有更高的效率,且语音有复杂的情感表达,对交互的体验有很大的提升。因此语音识别技术必将作为人机交互最便捷的方式而被广泛应用。长期以来,在语音识别领域声学模型的建模都是使用GMM-HMM模型,该模型具有可靠的精度,并且有成熟的EM算法来进行模型参数训练,因此GMM-HMM模型广泛应用在语音识别领域。但因为GMM模型属于浅层模型,随着数据量的增加建模能力明显不足。深度神经网络(DNN)因其对复杂数据有更好的建模与学习能力,成为语音识别领域研究的热点。本文深入研究了基于HMM模型和DNN模型的识别算法,分析两个模型的优点以及不足,主要进行了以下工作:(1)对基于隐马尔科夫模型(HMM)的语音识别算法进行深入研究,并使用CMUSphinx语音识别平台构建一个机器人控制命令语音识别系统,对机器人十个控制命令的语音信号进行训练得到语言模型和声学模型。实验解码结果表明,该系统平均错词率为7.1%,具有良好的识别效果,在小词汇量汉语语音识别中具有较高的识别率。(2)针对HMM模型的不足,对深度神经网络中的深度信念网络(DBN)深入研究,使用Kaldi语音识别工具实现了大词汇量中文连续语音识别系统的构建,对中文开源语音库THCHS30进行DNN声学模型训练,实验结果表明DNN模型比三音子模型错词率降低了5.79%,DNN模型在大词汇量语音识别系统中具有更好的识别效果。同时本文使用Kaldi对TIMIT语音库训练得到大词汇量英文语音识别系统,取得了较高的识别率。(3)噪声干扰一直是语音识别的难点,在使用Kaldi进行声学模型训练的过程中,通过在训练和测试语音加入白噪声、汽车背景噪声、自助餐背景噪声进行DNN训练,并与多种模型对比,实验结果表明DAE模型在低维表示方面具有更好的效果,可以用于恢复噪声损坏的输入。
[Abstract]:In the past 2016, artificial intelligence, virtual reality, wearable devices and so on have become the frontier and hot spot of the technology industry research, these research inevitably need people and computer interaction, Speech is more efficient than keyboard and mouse, and speech has complex emotion expression, so the interaction experience is greatly improved. Therefore, speech recognition technology will be widely used as the most convenient way of human-computer interaction. For a long time, the modeling of acoustic models in the field of speech recognition is based on GMM-HMM model, which has reliable precision and mature EM algorithm to train the model parameters. Therefore, GMM-HMM model is widely used in the field of speech recognition. However, because GMM model belongs to shallow model, the ability of modeling is obviously insufficient with the increase of data volume. Deep neural network (DNN) has become a hot topic in speech recognition field because of its better modeling and learning ability for complex data. In this paper, the recognition algorithms based on HMM model and DNN model are deeply studied, and the advantages and disadvantages of the two models are analyzed. The main work is as follows: (1) the speech recognition algorithm based on Hidden Markov Model (HMM) is studied deeply. A robot control command speech recognition system is constructed by using CMUSphinx speech recognition platform, and the speech model and acoustic model are obtained by training the speech signal of the robot's ten control commands. The experimental results show that the average error rate of the system is 7.1, which has a good recognition effect, and has a high recognition rate in small vocabulary Chinese speech recognition. (2) aiming at the deficiency of HMM model, In this paper, the deep belief network (DBN) in depth neural network is deeply studied, the large vocabulary Chinese continuous speech recognition system is constructed by using Kaldi speech recognition tool, and the DNN acoustic model training is carried out on THCHS30, a Chinese open source speech database. The experimental results show that the DNN model has a better recognition effect in the large vocabulary speech recognition system than the trisyllabic model. At the same time, this paper uses Kaldi to train the TIMIT speech corpus to obtain a large vocabulary English speech recognition system, and obtains a high recognition rate. (3) noise interference is always a difficult point in speech recognition. In the process of using Kaldi to train acoustic model, By adding white noise, automobile background noise and buffet background noise into the training and testing speech, the DNN training is carried out, and compared with many models, the experimental results show that the DAE model is more effective in low dimensional representation. Can be used to restore noise damaged input.
【学位授予单位】:江西理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TN912.34

【参考文献】

相关期刊论文 前10条

1 刘旺玉;SHIRAISHI HIROSHI;;基于GMM-HMM和深层循环神经网络的复杂噪声环境下的语音识别[J];制造业自动化;2016年05期

2 屈丹;张文林;;基于本征音子说话人子空间的说话人自适应算法[J];电子与信息学报;2015年06期

3 王山海;景新幸;杨海燕;;基于深度学习神经网络的孤立词语音识别的研究[J];计算机应用研究;2015年08期

4 尹宝才;王文通;王立春;;深度学习研究综述[J];北京工业大学学报;2015年01期

5 戴礼荣;张仕良;;深度语音信号与信息处理:研究进展与展望[J];数据采集与处理;2014年02期

6 余凯;贾磊;陈雨强;徐伟;;深度学习的昨天、今天和明天[J];计算机研究与发展;2013年09期

7 陆俊;张琼;杨俊安;王一;刘辉;;嵌入深度信念网络的点过程模型用于关键词检出[J];信号处理;2013年07期

8 谢怡宁;黄金杰;何勇军;;噪声环境下智能机器人语音控制特征提取方法[J];北京邮电大学学报;2013年03期

9 杨雅婷;马博;王磊;吐尔洪·吾司曼;李晓;;维吾尔语语音识别中发音变异现象[J];清华大学学报(自然科学版);2011年09期

10 孙峰;姚毅;李成刚;;LM算法在神经网络语音识别中的应用[J];科学技术与工程;2011年09期

相关硕士学位论文 前3条

1 王琳;噪声环境下的鲁棒语音识别技术研究[D];哈尔滨工业大学;2016年

2 张建华;基于深度学习的语音识别应用研究[D];北京邮电大学;2015年

3 陈硕;深度学习神经网络在语音识别中的应用研究[D];华南理工大学;2013年



本文编号:2324964

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/xinxigongchenglunwen/2324964.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户3c744***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com