基于词格的语音文档检索技术研究

发布时间：2018-04-22 11:28

本文选题：语音文档检索 + 词格　；参考：《解放军信息工程大学》2012年硕士论文

【摘要】：语音文档检索是根据用户提出的查询项，在海量语音资源中搜索并返回与之相关联的语音文档或语音片段的过程，在信息安全、语音搜索引擎以及语音资源的分类管理等领域具有重要的应用价值。近年来基于Lattice的语音文档检索技术迅速发展成为了当前语音文档检索的主流技术，受到了越来越多的重视和青睐。然而，Lattice的特殊结构在包含更多正确识别结果的同时，也带来了新的问题和挑战。本文针对汉语Lattice的特点，在Lattice结构改进、最优识别单元和检索单元选取、相关文档重排序等方面开展研究，以达到加快检索速度、提高检索精度的目的，主要工作集中在以下三个方面： (1)针对传统Lattice生成方法忽略了音位属性等语音知识的问题，提出了一种融合音位属性的Lattice结构改进方法。由于不同来源的Lattice具有信息互补性，该方法首先利用基于音位属性检测的语音识别系统建立Lattice，然后与传统自动语音识别系统生成的Lattice进行信息融合。针对融合后Lattice规模增大的问题，采用基于位置的分段对齐方法对其结构进行压缩，，得到一种结构紧凑且融合音位属性的Lattice改进结构。实验结果表明，改进后的Lattice包含更多的正确识别结果，其索引覆盖率由77.83%上升到80.34%，Lattice错误率由25.31%下降到19.66%，同时有效地提高了语音检索性能。 (2)针对汉语语音文档检索中最优识别单元和检索单元不一致的问题，提出了一种基于子词PSPL的语音文档索引方法。该方法首先以词为识别单元对语音文档进行解码，得到PSPL；然后对PSPL进行子词切分，并根据子词弧与原始词弧的后验概率关系，将PSPL转换为相应的子词PSPL；最后以子词PSPL作为索引进行查询项检索，实现了以词作为识别单元、子词作为检索单元的目的。实验结果表明，该检索方法在利用丰富语言信息的同时，较好地解决了词解码器存在的边界分割不正确问题，其检索性能明显优于目前普遍使用的识别单元和检索单元均为词的PSPL索引方法。 (3)针对检索结果中相关文档排序不准确的问题，提出了一种基于声学特征相似度的相关文档重排序方法。该方法利用虚拟相关反馈技术对语音文档检索系统进行改进，首先从第一次检索结果中选取相关度得分较高的前N篇语音文档构成虚拟相关文档集合，然后比较检索出的语音文档和虚拟相关文档集合在查询项出现时间段内的声学特征相似度，最后对原始相关度和声学特征相似度进行融合得到新的相关度分数，并依据新的相关度分数对检索结果进行重排序。实验结果表明，重排序后的检索结果中R-准确率由69.07%上升到75.82%，同时随着迭代次数的增多，检索性能得到了进一步提升。
[Abstract]:Speech document retrieval is the process of searching and returning the associated speech documents or speech fragments in the massive speech resources according to the query items put forward by the user. Speech search engine and classification management of speech resources have important application value. In recent years, the technology of voice document retrieval based on Lattice has rapidly developed into the mainstream technology of voice document retrieval, which has been paid more and more attention and favor. However, the special structure of lattice not only contains more correct identification results, but also brings new problems and challenges. According to the characteristics of Chinese Lattice, this paper studies on the improvement of Lattice structure, the selection of optimal identification unit and retrieval unit, and the reordering of relevant documents, in order to speed up the retrieval speed and improve the retrieval accuracy. The main focus is on the following three areas: 1) aiming at the problem that the traditional Lattice generation method neglects phonetic knowledge such as phonetic attributes, an improved Lattice structure method is proposed, which combines phonetic attributes. Because Lattice from different sources are complementary to each other, a speech recognition system based on phonological attribute detection is used to establish Lattice, and then to fuse information with Lattice generated by traditional automatic speech recognition system. In order to solve the problem of increasing the scale of Lattice after fusion, a new improved structure of Lattice with compact structure and fused phonemes is obtained by using the piecewise alignment method based on position. The experimental results show that the improved Lattice contains more correct recognition results, and its index coverage increases from 77.83% to 80.34%. The error rate of Lattice is reduced from 25.31% to 19.66%, and the performance of speech retrieval is improved effectively. 2) aiming at the inconsistency between the optimal recognition unit and the retrieval unit in Chinese speech document retrieval, a speech document indexing method based on subword PSPL is proposed. The method firstly decodes the speech document with the word recognition unit and obtains the PSPL.Then, the sub-word segmentation of the PSPL is carried out, and according to the posterior probability relation between the subword arc and the original word arc, The PSPL is transformed into the corresponding sub-word PSPL.The last, the query item is retrieved by using the sub-word PSPL as the index, which realizes the purpose of using the word as the identification unit and the sub-word as the retrieval unit. The experimental results show that the retrieval method not only makes use of rich language information, but also solves the problem of incorrect boundary segmentation in word decoders. Its retrieval performance is obviously superior to that of the PSPL indexing method, which is widely used at present, in which the recognition unit and the retrieval unit are both words. In order to solve the problem of inaccurate sorting of relevant documents in retrieval results, a new method of document reordering based on acoustic feature similarity is proposed. The method uses virtual correlation feedback technology to improve the speech document retrieval system. Firstly, the first N speech documents with high correlation score are selected from the first retrieval result to form the virtual correlation document set. Then we compare the acoustic similarity between the retrieved speech document and the virtual correlation document set in the time when the query item appears. Finally, we fuse the original correlation degree and the acoustic feature similarity to get a new correlation score. The retrieval results are reordered according to the new correlation score. The experimental results show that the R- accuracy of the reordered retrieval results is increased from 69.07% to 75.82%, and the retrieval performance is further improved with the number of iterations increasing.
【学位授予单位】：解放军信息工程大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TN912.34

【参考文献】