文本相似度计算方法研究综述

发布时间：2018-11-10 10:32

【摘要】：【目的】分析文本相似度计算方法,了解该领域的发展态势。【文献范围】在CNKI和Web of Science中分别以检索式"篇名:文本相似度OR篇名:词汇相似度OR篇名:语义相似度"和"TI:‘text similarity’or‘semantic similarity’or‘lexical similarity’"并限定文献类型进行检索,最终得到69篇重点文献。【方法】对文本相似度计算方法进行系统梳理,分析重点方法的基本思想、特点并总结未来发展方向。【结果】形成了较为全面的分类描述体系,文本相似度计算方法可分为4类:基于字符串的方法、基于语料库的方法、基于世界知识的方法和其他方法。其中,基于神经网络和基于世界知识的方法以及针对跨领域文本的相似度计算将成为该领域的发展趋势。【局限】仅将不同方法本身作为探讨的核心,未进一步分析方法的应用情况。【结论】有助于全面把握和深入了解文本相似度计算方法的研究现状和未来趋势。
[Abstract]:[objective] to analyze the method of text similarity calculation, To understand the development situation in this field. [literature scope] in CNKI and Web of Science respectively in the search style "title: text similarity OR title: vocabulary similarity OR title: semantic similarity" and "TI:'text similarity'or'semantic similarity" 'or'lexical similarity' and define the type of document to retrieve, Finally, 69 important documents were obtained. [methods] the text similarity calculation methods were systematically combed, the basic ideas and characteristics of the key methods were analyzed, and the future development direction was summarized. [results] A more comprehensive classification and description system was formed. Text similarity calculation methods can be divided into four categories: methods based on strings, methods based on corpus, methods based on world knowledge and other methods. Among them, methods based on neural network and world knowledge and similarity calculation for cross-domain text will become the development trend in this field. [limitations] only different methods themselves are the core of the discussion. The conclusion is helpful to grasp and understand the present situation and future trend of text similarity calculation methods.
【作者单位】：中国科学院成都文献情报中心;中国科学院大学;
【分类号】：TP391.1

【相似文献】