当前位置:主页 > 科技论文 > 网络通信论文 >

基于深度学习的语音识别研究

发布时间:2018-01-18 03:23

  本文关键词:基于深度学习的语音识别研究 出处:《北京邮电大学》2014年硕士论文 论文类型:学位论文


  更多相关文章: 语音识别 深度学习 特征提取 声学建模 深度神经网络 深度自动编码器


【摘要】:进入移动互联网时代,语音识别作为实现人机自由交互的关键技术,值得深入研究。同时面对大数据的挑战,由于深度学习能够从海量数据中挖掘有效信息,成为模式识别领域的一个研究热点。以深度学习理论为基础,对语音识别进行研究具有理论意义和实用价值。 深度学习本质上是一种采取多层非线性变换的信息提取技术,通过其层次化的特征结构,从而实现对数据间复杂关系的建模。本文首先介绍了语音识别的基本原理及研究现状,详细阐明深度学习的基础理论及其网络模型,然后着重就如何将深度学习理论更好地应用于语音识别中展开研究。 1、研究了基于深度自动编码器的声学特征提取方法 良好的声学特征对于语音识别系统的性能至关重要。本文就深度自动编码器的基本原理展开,分别从声学特征预处理、网络结构包括隐含层层数和节点数以及网络并行训练算法等几个方面作了较深入的探讨;在Matlab平台上构建基于语音特征的自动编码器,分别利用无监督和有监督的训练方式从原始MFCC特征中提取鲁棒性更强的语音特征;最后通过HTK语音识别框架对863汉语语音库进行测试,基于无监督和有监督提取的新特征和原始特征相比,在词识别正确率方面分别提高了1.96%和3.53%。 2、研究了基于DNN-HMM的声学建模方法 声学模型是语音识别系统不可或缺的组成部分。本文通过分析深度神经网络和高斯混合模型在结构和训练方式的异同,阐述了DNN用于描述HMM状态输出概率分布的可行性;在Kaldi开源语音识别平台上分别实现了基于GMM-HMM和基于DNN-HMM的声学模型建模,并在RM语音库上通过实验证明了应用DNN-HMM模型比GMM-HMM模型的识别系统在词识别错误率上相对下降30%。
[Abstract]:In the era of mobile Internet, speech recognition is the key technology to realize human-computer free interaction, which is worthy of further study. At the same time, in the face of big data's challenge, because of the deep learning can mine the effective information from the massive data. Based on the theory of deep learning, the research on speech recognition has theoretical significance and practical value. Depth learning is essentially a multi-layer nonlinear transformation of information extraction technology through its hierarchical feature structure. In order to realize the modeling of the complex relationship between the data. Firstly, this paper introduces the basic principle and research status of speech recognition, and expounds the basic theory of depth learning and its network model in detail. Then it focuses on how to better apply depth learning theory to speech recognition. 1. The acoustic feature extraction method based on depth automatic encoder is studied. Good acoustic features are very important to the performance of speech recognition system. In this paper, the basic principle of depth automatic encoder is developed, which is preprocessed from acoustic features. The network structure includes hidden layer number, node number and parallel training algorithm. An automatic encoder based on speech features is constructed on the Matlab platform. The unsupervised and supervised training methods are used to extract the more robust speech features from the original MFCC features. Finally, the HTK speech recognition framework is used to test the 863 Chinese phonetic corpus, which is based on the unsupervised and supervised features compared with the original features. The accuracy of word recognition was increased by 1.96% and 3.53 respectively. 2. The acoustic modeling method based on DNN-HMM is studied. Acoustic model is an indispensable part of speech recognition system. This paper analyzes the similarities and differences between the structure and training methods of the hybrid model of depth neural network and Gao Si. The feasibility of using DNN to describe the probability distribution of HMM state output is expounded. The acoustic model modeling based on GMM-HMM and DNN-HMM is implemented on the Kaldi open source speech recognition platform. It is proved that the error rate of word recognition using DNN-HMM model is 30% lower than that with GMM-HMM model.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TN912.34

【参考文献】

相关期刊论文 前2条

1 李海峰;李纯果;;深度学习结构和算法比较分析[J];河北大学学报(自然科学版);2012年05期

2 余凯;贾磊;陈雨强;徐伟;;深度学习的昨天、今天和明天[J];计算机研究与发展;2013年09期

相关博士学位论文 前2条

1 鄢志杰;声学模型区分性训练及其在自动语音识别中的应用[D];中国科学技术大学;2008年

2 罗恒;基于协同过滤视角的受限玻尔兹曼机研究[D];上海交通大学;2011年



本文编号:1439246

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/wltx/1439246.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户8b4ab***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com