基于模板匹配的语音样例快速检索技术研究

发布时间：2018-09-03 11:18

【摘要】：语音样例检索是根据用户输入的查询样例（即波形片段），在海量的语音资源中搜索并返回与之相关联的语音片段的过程。它在信息安全、语音搜索引擎以及语音资源的分类管理等领域具有重要的应用价值。基于模板匹配的语音样例检索是当前语音样例检索的主流技术之一。然而，直接运用该方法进行语音样例的检索存在时间消耗量大以及不能充分考虑声学条件变异的缺点。针对上述缺点，本文主要在减少检索时间消耗量以及相关区域重排序等方面开展研究，，以达到加快检索速度、提高检索精度的目的。本文的主要工作集中在以下三个方面：针对直接运用动态时间规整进行语音样例检索在相关区域搜索时时间消耗量大的问题，提出融合分段累积近似下界估计的动态时间规整算法，此算法通过大规模减少相关区域搜索时的动态匹配次数来达到提高检索速度的目的。该方法首先计算查询样例和测试语句中每个匹配区域之间动态规整得分的分段累积近似下界估计；然后运用K最近邻搜索算法和动态时间规整算法搜索与查询样例相关的区域。实验结果表明：该方法的检索速度是直接运用动态时间规整进行检索的5.9倍，而对其检索精度毫无影响。直接运用动态时间规整进行语音样例检索存在大量的冗余计算和冗余匹配。针对此问题，提出了一种基于分段动态时间规整的语音样例检索方法，该方法将测试语句按照一定规则划分为一系列匹配区域；然后运用动态时间规整进行语音样例的检索。为进一步提高检索效率，将分段动态时间规整算法和分段累积近似下界估计相结合。同时为了增加对声学条件变异的考虑，运用虚拟相关反馈技术修正检索结果，提出基于虚拟相似度的相关区域重排序方法。实验结果表明：该方法的检索速度是直接运用动态时间规整进行检索的14.6倍，检索精度相对于后者也提高了5.21%。针对融合下界估计的动态时间规整算法和融合下界估计的分段动态规整算法存在的局限，提出融合边界信息的动态时间规整算法。该方法首先运用层次凝聚聚类算法将查询样例和测试语句的音素后验概率特征序列分段（即边界检测），计算每个分段的均值向量，并将这些均值向量组成新索引和新查询；再运用动态时间规整算法进行语音样例的检索；最后采用虚拟相关反馈修正检索结果。实验结果表明：该方法的检索速度是直接运用动态时间规整进行检索的15.4倍，检索精度在后者的基础上也提高了0.73%。
[Abstract]:Speech sample retrieval is a process of searching and returning the associated speech fragments in a large amount of speech resources according to the query samples (i.e. waveform fragments) input by the user. It has important application value in the fields of information security, voice search engine and classification management of speech resources. Speech sample retrieval based on template matching is one of the main techniques in speech sample retrieval. However, the direct use of this method for the retrieval of speech samples has the disadvantages of high time consumption and insufficient consideration of acoustic condition variation. In order to speed up the retrieval speed and improve the retrieval accuracy, this paper mainly focuses on reducing the retrieval time consumption and reordering the relevant areas in order to speed up the retrieval speed and improve the retrieval accuracy. The main work of this paper is focused on the following three aspects: aiming at the problem of large amount of time consumption in the search of related areas by direct use of dynamic time regularization for speech sample retrieval, A dynamic time warping algorithm based on piecewise cumulative approximate lower bound estimation is proposed. This algorithm can improve the retrieval speed by reducing the number of dynamic matching in search of relevant regions on a large scale. This method first calculates the piecewise cumulative approximate lower bound estimation of the dynamic warping scores between the query samples and each matching region in the test statement, and then uses the K-nearest neighbor search algorithm and the dynamic time warping algorithm to search the regions related to the query samples. The experimental results show that the retrieval speed of this method is 5.9 times faster than that of the direct use of dynamic time regulation, but it has no effect on the retrieval accuracy. There are a lot of redundant computation and redundant matching in speech sample retrieval using dynamic time warping. To solve this problem, a speech sample retrieval method based on piecewise dynamic temporal regularity is proposed, which divides test statements into a series of matching regions according to certain rules, and then uses dynamic time warping to retrieve speech samples. In order to further improve the retrieval efficiency, the piecewise dynamic time warping algorithm is combined with the piecewise cumulative approximate lower bound estimation. At the same time, in order to increase the consideration of acoustic condition variation, virtual correlation feedback technique is used to modify the retrieval results, and a virtual similarity based relative region reordering method is proposed. The experimental results show that the retrieval speed of this method is 14.6 times faster than that of the direct use of dynamic time warping, and the retrieval accuracy is 5.21 times higher than that of the latter. In view of the limitations of the dynamic time warping algorithm for fusion lower bound estimation and the segmented dynamic warping algorithm for fusion lower bound estimation, a dynamic time warping algorithm based on fusion boundary information is proposed. The method first uses hierarchical aggregation clustering algorithm to segment the phoneme posteriori probability feature series of query samples and test sentences (i.e. boundary detection), calculates the mean vector of each segment, and sets these mean vectors into new indexes and new queries. Then the dynamic time warping algorithm is used to retrieve the speech samples, and the virtual correlation feedback is used to correct the retrieval results. The experimental results show that the retrieval speed of this method is 15.4 times faster than that of the direct use of dynamic time warping, and the retrieval accuracy is improved by 0.73 on the basis of the latter.
【学位授予单位】：解放军信息工程大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TN912.3

【共引文献】