中文语音关键词检出技术研究

发布时间：2018-08-08 14:38

【摘要】：随着深度学习的发展,深度神经网络(Deep Neural Network,DNN)与循环神经网络(Recurrent Neural Networks,RNN)已被成功应用于英文语音识别和语音关键词检出系统。本文主要研究了分别用深度神经网络-隐马尔科夫模型(Deep Neural Network-Hidden Markov Model,DNN-HMM)和带有长短时记忆单元的循环神经网络(Long Short Term Memory RNN,LSTM-RNN)对中文声韵母进行声学建模,从而优化现有中文语音关键词检出系统性能。本文首先介绍了连续语音识别的框架与原理,包括语音信号的特征提取、语音信号声学建模技术、发音字典和语言模型以及基于加权有限状态转换器的语音解码网络。其中语音信号特征提取包括感知线性预测系数、梅尔频率倒谱系数、滤波器组特征、基频特征四种声学特征。其次研究了基于连续语音识别器的语音关键词检出技术,包括基于网格结构建立索引、关键词搜索方法、关键词确认置信度以及语音关键词检出系统的评价指标。本文还研究了一种中文语音关键词检出系统,此系统采用高识别率的声韵母进行声学建模和检索,通过查表法将输入汉字字符形式的关键字转化为声韵母进行关键词检出。本文分别训练基于DNN-HMM的声学模型和基于LSTM-RNN的声学模型,搭建中文语音关键词检出系统,各得到了73.32%和79.84%的召回率,说明使用LSTM-RNN声学建模可以优化语音关键词检出系统性能。之后为搭建的中文语音关键词检出系统选取不同声学特征进行性能分析,结果表明基频特征可以一定程度上提高检出性能;然后采用融合置信度优化中文语音关键词检出系统性能;其次,对比两个系统在不同规格训练数据下的性能,讨论了各自的应用范围;最后,提出了一种召回率更高的基于系统融合的中文语音关键词检出系统。
[Abstract]:With the development of deep learning, depth neural network (Deep Neural) and cyclic neural network (Recurrent Neural) have been successfully applied to English speech recognition and speech keyword detection systems. In this paper, the acoustic modeling of Chinese consonants is mainly studied by using the deep neural network-hidden Markov model (Deep Neural Network-Hidden Markov Model DNN-HMM) and the cyclic neural network (Long Short Term Memory RNNN LSTM-RNN) with long and short memory units. In order to optimize the existing Chinese voice keyword detection system performance. This paper first introduces the framework and principle of continuous speech recognition, including feature extraction of speech signal, acoustic modeling technology of speech signal, pronunciation dictionary and language model, and speech decoding network based on weighted finite state converter. The speech signal feature extraction includes four acoustic features: perceptual linear prediction coefficient, Mel frequency cepstrum number, filter bank feature and fundamental frequency feature. Secondly, the technology of speech keyword detection based on continuous speech recognizer is studied, including indexing based on grid structure, keyword search method, confidence of keyword confirmation and evaluation index of speech keyword detection system. This paper also studies a Chinese phonetic keyword detection system, which uses a high recognition rate phonetic mother for acoustic modeling and retrieval, and converts the key words in the Chinese character form to the consonant for keyword detection through the look-up table method. In this paper, the acoustic model based on DNN-HMM and the acoustic model based on LSTM-RNN are trained, and the Chinese voice keyword detection system is built, and the recall rates of 73.32% and 79.84% are obtained, respectively. It is shown that the performance of the system can be optimized by using LSTM-RNN acoustic modeling. Then the different acoustic features are selected for the Chinese speech keyword detection system. The results show that the fundamental frequency feature can improve the detection performance to some extent. Then the fusion confidence is used to optimize the performance of the Chinese voice keyword detection system. Secondly, the performance of the two systems under different specifications training data is compared, and their application scope is discussed. A Chinese speech keyword detection system based on system fusion with higher recall rate is proposed.
【学位授予单位】：南京理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TN912.3

【相似文献】