面向仪表校验机器人的自然语言理解研究

发布时间：2018-03-27 16:17

本文选题：语音识别　切入点：仪表校验　出处：《哈尔滨工业大学》2017年硕士论文

【摘要】：语音指令的识别与理解是仪表校验机器人智能化控制的前提,本课题针对常见商用语音识别软件对校验术语识别效果差的问题,进行面向仪表校验机器人的自然语言理解研究。从声学模型和语言模型两方面入手,分析影响语音识别系统性能的关键指标,并对指令语句进行理解。同时考虑到应用过程中控制语句的扩充需求,引入说话人自适应模块,解决训练过程中数据稀疏的问题。首先,基于短时分析技术实现对语音信号的预处理,包括预加重与短时加窗运算,减少采集和传输过程中的衰减与干扰,并获取若干用于后续计算的短时准稳态过程。采用MFCC参数基础上的特征向量,提取了包含12维MFCC系数、对数能量值、以及它们的一阶、二阶差分系数,共39维的特征向量,将无法直接处理的语音信号抽象成数学形式。然后,针对识别基元建立HMM(隐马尔可夫模型)声学模型,采用输出为连续5维混合高斯分布的5状态自左向右的拓扑结构。利用Baum-Welch算法进行嵌入式训练,在上下文无关的monophone模型的基础上训练识别性能更优的上下文相关triphone模型,通过基于Viterbi算法的令牌传递法进行解码,获取语音识别结果,识别率达到90.2%。打破单纯使用声学模型的局限性,融合二元文法统计语言模型规定字词的上下文依存关系,令识别率提升至98.9%。为每条指令赋予唯一ID编号,实现自然语言的简单理解,将语音指令识别结果转化机器可以处理的形式。最后,针对仪表校验机器人实际应用过程中扩展词条时可能出现的训练数据稀疏问题,在MAP、MLLR算法的基础上,利用MAP/MLLR混合算法进行说话人自适应计算,将自适应前15.5%的识别率提升至85.0%,进而实现只需少量目标说话人的训练数据即可稳健匹配新说话人语音特征的自适应设计。
[Abstract]:The recognition and understanding of speech instructions is the premise of intelligent control of instrument verification robot. In this paper, the natural language understanding of instrumentation verification robot is studied. The key indexes affecting the performance of speech recognition system are analyzed from two aspects: acoustic model and language model. At the same time, the speaker adaptive module is introduced to solve the problem of sparse data in the process of training. Based on the short-time analysis technology, the speech signal preprocessing is realized, including pre-weighting and short-time windowing operation, which can reduce the attenuation and interference in the process of acquisition and transmission. Some short-time quasi-steady-state processes for subsequent computation are obtained. Using the eigenvector based on MFCC parameters, the eigenvectors including 12-dimensional MFCC coefficients, logarithmic energy values, their first-order, second-order difference coefficients, and common 39-dimensional eigenvectors are extracted. Abstract the speech signal which can not be processed directly into mathematical form. Then, the HMMM (Hidden Markov Model) acoustic model is established for the recognition primitive. A 5-state topology with output of continuous 5-dimensional mixed Gao Si distribution from left to right is adopted. Based on the context-independent monophone model, a context-dependent triphone model with better performance is trained by using the Baum-Welch algorithm. The method of token passing based on Viterbi algorithm is used to decode, and the result of speech recognition is obtained. The recognition rate is 90.2. The limitation of pure acoustic model is broken, and the context-dependent relation of words stipulated by the statistical language model of binary grammar is fused. Increase the recognition rate to 98.9. Assign a unique ID number to each instruction, achieve a simple understanding of the natural language, and convert the result of speech instruction recognition into a form that the machine can handle. Finally, In order to solve the problem of sparse training data in the practical application of instrumentation verification robot, the MAP/MLLR hybrid algorithm is used to solve the problem of sparse training data. The recognition rate of the first 15.5% of the adaptive speaker is raised to 85.0%, and then the adaptive design of matching the new speaker's speech features with only a small amount of target speaker's training data is realized.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3

【参考文献】