基于神经网络的嵌入式语音识别系统研究
发布时间:2018-07-28 16:56
【摘要】:语音识别技术是指让机器通过特定程序将人类语音转变成相应文本或命令的技术。近年来,得益于计算机硬件和通信网络的飞速发展,语音识别技术的研究取得了许多令人鼓舞的成绩,市场上也出现了不少相对成熟的产品。一种本地识别和云端技术的运作模式的兴起可以解决多年来嵌入式语音识别系统计算能力和存储空间有限的难题,人们可以更加专注于如何更好地提高语音识别系统的准确率。一直以来,一些经典的识别算法是以线性系统理论为基础的,而人的发音实际上是一个复杂的非线性过程,基于线性系统理论的语音识别系统在实际环境中会有一定的局限性。本文以提高语音识别系统的准确率以及泛化能力为目标,进行了相关的研究和实验。 语音识别系统一般包括语音预处理、特征参数提取、识别模型和语音合成等部分。本文首先对语音识别技术的发展历史和国内外现状进行介绍,然后对各环节进行理论研究和分析,研究从语音采集,预处理,端点检测,特征参数提取,时间规整网络和语音识别模型各阶段的理论和算法,选用MFCC为语音特征参数,给出一套完整的语音识别系统的设计方案。论文主要专注于识别模型的选取,通过对比各种识别算法,选择BP神经网络作为识别模型的基本单元。针对语音识别系统准确率的问题以及BP神经网络算法不足之处,引入神经网络集成理论,为提高集成网络中个体差异性,通过K均值聚类法对神经网络集成的网络个体生成部分进行改进,最终将多个BP网络进行有效整合构建成本文的识别模型。 为验证方法的有效性,分别在matlab平台和VC6.0平台设计与开发一个MFCC特征参数与改进BP神经网络集成的语音别系统,通过对仿真实验结果的性能分析和比较,证实本文方法的有效性。 最后论文在对现在嵌入式系统研究的基础上,选用目前比较流行的Android手机操作系统,针对特定的硬件平台,详细介绍Android平台的软件架构以及应用开发环境的搭建流程,成功地在以ARM11为核心的开发板上定制了Android2.3.4操作系统,并最终在该平台进行了简单应用。
[Abstract]:Speech recognition is a technology that allows machines to turn human speech into text or commands through specific programs. In recent years, thanks to the rapid development of computer hardware and communication network, the research of speech recognition technology has made many encouraging achievements, and there are many relatively mature products in the market. The rise of a local recognition and cloud operating mode can solve the problem of limited computing power and storage space of embedded speech recognition system for many years, and people can focus more on how to improve the accuracy of speech recognition system. All along, some classical recognition algorithms are based on linear system theory, but human pronunciation is actually a complex nonlinear process, and the speech recognition system based on linear system theory will have some limitations in the actual environment. In order to improve the accuracy and generalization ability of speech recognition system, this paper carries out relevant research and experiments. Speech recognition system includes speech preprocessing, feature extraction, recognition model and speech synthesis. This paper first introduces the development history of speech recognition technology and the present situation at home and abroad, then carries on the theoretical research and the analysis to each link, studies from the speech collection, the preprocessing, the endpoint detection, the characteristic parameter extraction, The theory and algorithm of each stage of time regular network and speech recognition model are discussed. MFCC is selected as the speech feature parameter and a complete design scheme of speech recognition system is given. This paper mainly focuses on the selection of recognition model. By comparing various recognition algorithms, BP neural network is selected as the basic unit of recognition model. Aiming at the problem of accuracy of speech recognition system and the deficiency of BP neural network algorithm, the neural network ensemble theory is introduced to improve the individual difference in the integrated network. The K-means clustering method is used to improve the individual generation of neural network ensemble. Finally, several BP networks are effectively integrated into the recognition model of this paper. In order to verify the effectiveness of the method, a speech discrimination system based on matlab and VC6.0 is designed and developed, which integrates MFCC feature parameters with improved BP neural network. The performance analysis and comparison of the simulation results are carried out. The validity of this method is verified. Finally, on the basis of the research of embedded system, this paper selects the popular Android mobile phone operating system, and introduces the software architecture of Android platform and the construction process of the application development environment in detail for the specific hardware platform. The Android2.3.4 operating system was successfully customized on the development board with ARM11 as the core, and the simple application was finally carried out on the platform.
【学位授予单位】:广东工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP368.1;TN912.34
本文编号:2150946
[Abstract]:Speech recognition is a technology that allows machines to turn human speech into text or commands through specific programs. In recent years, thanks to the rapid development of computer hardware and communication network, the research of speech recognition technology has made many encouraging achievements, and there are many relatively mature products in the market. The rise of a local recognition and cloud operating mode can solve the problem of limited computing power and storage space of embedded speech recognition system for many years, and people can focus more on how to improve the accuracy of speech recognition system. All along, some classical recognition algorithms are based on linear system theory, but human pronunciation is actually a complex nonlinear process, and the speech recognition system based on linear system theory will have some limitations in the actual environment. In order to improve the accuracy and generalization ability of speech recognition system, this paper carries out relevant research and experiments. Speech recognition system includes speech preprocessing, feature extraction, recognition model and speech synthesis. This paper first introduces the development history of speech recognition technology and the present situation at home and abroad, then carries on the theoretical research and the analysis to each link, studies from the speech collection, the preprocessing, the endpoint detection, the characteristic parameter extraction, The theory and algorithm of each stage of time regular network and speech recognition model are discussed. MFCC is selected as the speech feature parameter and a complete design scheme of speech recognition system is given. This paper mainly focuses on the selection of recognition model. By comparing various recognition algorithms, BP neural network is selected as the basic unit of recognition model. Aiming at the problem of accuracy of speech recognition system and the deficiency of BP neural network algorithm, the neural network ensemble theory is introduced to improve the individual difference in the integrated network. The K-means clustering method is used to improve the individual generation of neural network ensemble. Finally, several BP networks are effectively integrated into the recognition model of this paper. In order to verify the effectiveness of the method, a speech discrimination system based on matlab and VC6.0 is designed and developed, which integrates MFCC feature parameters with improved BP neural network. The performance analysis and comparison of the simulation results are carried out. The validity of this method is verified. Finally, on the basis of the research of embedded system, this paper selects the popular Android mobile phone operating system, and introduces the software architecture of Android platform and the construction process of the application development environment in detail for the specific hardware platform. The Android2.3.4 operating system was successfully customized on the development board with ARM11 as the core, and the simple application was finally carried out on the platform.
【学位授予单位】:广东工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP368.1;TN912.34
【引证文献】
相关硕士学位论文 前1条
1 卜学哲;语音识别算法在ARM-linux平台上的研究与实现[D];河北科技大学;2013年
,本文编号:2150946
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2150946.html