基于DMLS的语音关键词检测技术研究

发布时间：2018-03-09 19:23

本文选题：关键词检测　切入点：动态匹配词格检索　出处：《解放军信息工程大学》2014年硕士论文　论文类型：学位论文

【摘要】：语音关键词检测是指在语音数据中查找到所有可能出现的给定词的过程,是有效处理口语和实现人机智能通信的解决方案之一,具有广泛的应用前景。目前,基于动态匹配词格检索(Dynamic Match Lattice Spotting, DMLS)的方法是关键词检测的主流方法之一。DMLS将基于Lattice的快速检测和动态序列匹配技术融合在一起,在Lattice检索过程中应用最小编辑距离来补偿音素识别器的插入、删除和替换错误,实现了快速而准确的关键词检测。本文针对DMLS方法的特点,在Lattice生成、索引创建、关键词置信度和集外词检测等方面开展研究,主要工作及创新点如下：(1)音素Lattice的精度直接影响关键词检测的性能,为了提高Lattice的精度,应用TRAP特征和多层感知器构建了更为精准的音素Lattice生成系统,并在此基础上搭建了基于DMLS的关键词检测基线系统。该系统采用改进的维特比算法遍历Lattice来创建一个固定长度的音素序列数据库(Sequence Database, SDB),在检索阶段应用最小编辑距离作为置信度来实现关键词的检出。实验结果表明,基于TRAP特征生成的Lattic e搭建的基线系统相比MFCC和PLP特征具有一定的优势,系统的召回率提升了约5%。(2)针对DMLS中索引阶段SDB创建损失部分信息和查询项长度超出索引长度的问题,提出了一种改进的混合索引的方法,将最大概率音素序列和SDB融合构成混合索引。最大概率音素序列是语音识别中的1-best完整结果,可以代表整个Lattice上的全局最优结果,与SDB形成一定的互补,并且最大概率音素序列不受音素序列长度N的影响,可以用于辅助音素序列较长的查询项的检测。实验结果表明,混合索引方法相比单一的SDB索引系统的品质因数提升了1.4%。(3)在基于DMLS的关键词检测系统中,应用最小编辑距离作为关键词检出的置信度,该方法在提高检出率的同时也增加了虚警率。针对此问题,提出了一种融合后验概率的混合置信度方法。该方法首先将基于Lattice的后验概率引入到DMLS的索引建立中,其次应用数据驱动的音素替换、插入和删除代价来实现更加灵活的近似匹配,最后通过联合最小编辑距离和后验概率置信度得分进行关键词检测。实验结果表明,最小编辑距离和后验概率置信度具有一定的互补性,系统的等错误率相对降低了13.3%。(4)针对关键词检测中的集外词问题,提出了一种融合查询扩展和动态匹配的方法。由于查询扩展和动态匹配是在不同的层面补偿集外词发音的不确定性,考虑到两者潜在的互补性,研究了两种融合方法：一种方法是结果融合,分别应用查询扩展和动态匹配并行的检测集外词,然后合并检测结果；另一种是置信度融合,融合最小编辑距离和发音得分构成混合置信度进行集外词的检出与确认。实验结果表明,第二种融合方法的效果更好,系统的品质因数相对提升了19.8%。
[Abstract]:Speech keyword detection refers to the voice data to find all possible for a given word, is the effective treatment of oral and one solution to achieve intelligent man-machine communication, has wide application prospect. At present, the dynamic matching word lattice based retrieval (Dynamic Match Lattice Spotting, DMLS) method is one of the main methods of keyword detection.DMLS rapid detection and dynamic Lattice sequence matching technology together based on the application of insertion of the minimum edit distance in the Lattice retrieval process to compensate phoneme recognizer, delete and replace error, realize keyword detection quickly and accurately. This paper is based on DMLS method, create the index in the Lattice generation, and carry out study on the key words of confidence and out of vocabulary testing and other aspects, the main work and innovation are as follows: (1) directly affect the accuracy of the phoneme Lattice keyword detection ring The performance, in order to improve the accuracy of Lattice, application of TRAP and multilayer perceptron constructs a more precise phoneme Lattice generation system, which is established on the basis of DMLS baseline system based on keyword detection. The system adopts an improved Viterbi algorithm to traverse the Lattice to create a fixed length phoneme sequence database (Sequence Database. SDB), in the detection of phase retrieval using minimum edit distance as the confidence to achieve the keywords. The experimental results show that the baseline system characteristics of TRAP generated Lattic e based structures compared to MFCC and PLP feature has certain advantages, enhance the recall rate of about 5%. system (2) for the DMLS SDB to create the index loss part of the information and query length exceeds the length of the index, proposed a hybrid index improved, the maximum probability of phoneme sequences and SDB fusion hybrid The maximum probability index. 1-best is a complete sequence of phonemes results in speech recognition, can represent the global optimal results on the Lattice, form a complementary and SDB, and the maximum probability of phoneme sequence is not affected by the phoneme sequence length of N, can be used for the detection of query auxiliary phoneme sequences longer. The experimental results show that the mixed compared the quality factor index method SDB index system the improved 1.4%. (3) in the DMLS based keyword detection system, the application of minimum edit distance as the keyword detection confidence, this method can improve the detection rate and increase the false alarm rate. To solve this problem, we propose a hybrid fusion the confidence probability method. Firstly, Lattice posterior probability is introduced into the DMLS index based on data driven application to replace the second phone, insert and delete the price. The more flexible approximate matching, finally combined with the minimum edit distance and a posteriori probability confidence score for keyword detection. The experimental results show that the minimum edit distance and a posteriori confidence has a complementary system, etc. the relative error rate is reduced by 13.3%. (4) according to the problems in the detection of out of vocabulary words. This paper presents a fusion method for query expansion and dynamic matching. Because the query expansion and dynamic matching in pronunciation level compensation out of vocabulary of different uncertainty, considering the complementarity of the two potential, two kinds of fusion methods: one is the result of fusion was detected by out of vocabulary expansion and dynamic parallel matching the query, then merge the results; the other is confidence fusion, fusion minimum edit distance and the pronunciation score constitutes detection and validation of hybrid reliability of out of vocabulary words. The experimental results show that the effect of the second fusion methods is better, and the quality factor of the system is improved by 19.8%.

【学位授予单位】：解放军信息工程大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.3

【相似文献】