基于HMM和DNN的语音识别算法研究与实现

发布时间：2018-11-11 13:25

【摘要】：在过去的2016年,人工智能、虚拟现实、可穿戴设备等已成为科技行业研究的前沿和热点,这些研究都不可避免的需要人与计算机进行交互,语音比键盘鼠标的交互方式有更高的效率,且语音有复杂的情感表达,对交互的体验有很大的提升。因此语音识别技术必将作为人机交互最便捷的方式而被广泛应用。长期以来,在语音识别领域声学模型的建模都是使用GMM-HMM模型,该模型具有可靠的精度,并且有成熟的EM算法来进行模型参数训练,因此GMM-HMM模型广泛应用在语音识别领域。但因为GMM模型属于浅层模型,随着数据量的增加建模能力明显不足。深度神经网络(DNN)因其对复杂数据有更好的建模与学习能力,成为语音识别领域研究的热点。本文深入研究了基于HMM模型和DNN模型的识别算法,分析两个模型的优点以及不足,主要进行了以下工作:(1)对基于隐马尔科夫模型(HMM)的语音识别算法进行深入研究,并使用CMUSphinx语音识别平台构建一个机器人控制命令语音识别系统,对机器人十个控制命令的语音信号进行训练得到语言模型和声学模型。实验解码结果表明,该系统平均错词率为7.1%,具有良好的识别效果,在小词汇量汉语语音识别中具有较高的识别率。(2)针对HMM模型的不足,对深度神经网络中的深度信念网络(DBN)深入研究,使用Kaldi语音识别工具实现了大词汇量中文连续语音识别系统的构建,对中文开源语音库THCHS30进行DNN声学模型训练,实验结果表明DNN模型比三音子模型错词率降低了5.79%,DNN模型在大词汇量语音识别系统中具有更好的识别效果。同时本文使用Kaldi对TIMIT语音库训练得到大词汇量英文语音识别系统,取得了较高的识别率。(3)噪声干扰一直是语音识别的难点,在使用Kaldi进行声学模型训练的过程中,通过在训练和测试语音加入白噪声、汽车背景噪声、自助餐背景噪声进行DNN训练,并与多种模型对比,实验结果表明DAE模型在低维表示方面具有更好的效果,可以用于恢复噪声损坏的输入。
[Abstract]:In the past 2016, artificial intelligence, virtual reality, wearable devices and so on have become the frontier and hot spot of the technology industry research, these research inevitably need people and computer interaction, Speech is more efficient than keyboard and mouse, and speech has complex emotion expression, so the interaction experience is greatly improved. Therefore, speech recognition technology will be widely used as the most convenient way of human-computer interaction. For a long time, the modeling of acoustic models in the field of speech recognition is based on GMM-HMM model, which has reliable precision and mature EM algorithm to train the model parameters. Therefore, GMM-HMM model is widely used in the field of speech recognition. However, because GMM model belongs to shallow model, the ability of modeling is obviously insufficient with the increase of data volume. Deep neural network (DNN) has become a hot topic in speech recognition field because of its better modeling and learning ability for complex data. In this paper, the recognition algorithms based on HMM model and DNN model are deeply studied, and the advantages and disadvantages of the two models are analyzed. The main work is as follows: (1) the speech recognition algorithm based on Hidden Markov Model (HMM) is studied deeply. A robot control command speech recognition system is constructed by using CMUSphinx speech recognition platform, and the speech model and acoustic model are obtained by training the speech signal of the robot's ten control commands. The experimental results show that the average error rate of the system is 7.1, which has a good recognition effect, and has a high recognition rate in small vocabulary Chinese speech recognition. (2) aiming at the deficiency of HMM model, In this paper, the deep belief network (DBN) in depth neural network is deeply studied, the large vocabulary Chinese continuous speech recognition system is constructed by using Kaldi speech recognition tool, and the DNN acoustic model training is carried out on THCHS30, a Chinese open source speech database. The experimental results show that the DNN model has a better recognition effect in the large vocabulary speech recognition system than the trisyllabic model. At the same time, this paper uses Kaldi to train the TIMIT speech corpus to obtain a large vocabulary English speech recognition system, and obtains a high recognition rate. (3) noise interference is always a difficult point in speech recognition. In the process of using Kaldi to train acoustic model, By adding white noise, automobile background noise and buffet background noise into the training and testing speech, the DNN training is carried out, and compared with many models, the experimental results show that the DAE model is more effective in low dimensional representation. Can be used to restore noise damaged input.
【学位授予单位】：江西理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.34

【参考文献】