融合词向量的多特征句子相似度计算方法研究

发布时间：2018-06-14 09:07

本文选题：词向量 + 句子相似度　；参考：《计算机科学与探索》2017年04期

【摘要】：在归纳常见的句子相似度计算方法后,基于《人民日报》3.4万余份文本训练了用于语义相似度计算的词向量模型,并设计了一种融合词向量的多特征句子相似度计算方法。该方法在词方面,考虑了句子中重叠的词数和词的连续性,并运用词向量模型测量了非重叠词间的相似性;在结构方面,考虑了句子中重叠词的语序和两个句子的长度一致性。实验部分设计实现了4种句子相似度计算方法,并开发了相应的实验系统。结果表明:提出的算法能够取得相对较好的实验结果,对句子中词的语义特征和句子结构特征进行组合处理和优化,能够提升句子相似度计算的准确性。
[Abstract]:After summarizing common sentence similarity calculation methods, a word vector model for semantic similarity calculation is trained based on < People's Daily > 34000 texts, and a multi-feature sentence similarity calculation method combining word vectors is designed. In terms of words, the number of overlapping words and the continuity of words in a sentence are considered, and the similarity between non-overlapping words is measured by using word vector model. In the aspect of structure, the word order and the length consistency of two sentences are considered. In the experiment part, four kinds of sentence similarity calculation methods are designed and implemented, and the corresponding experimental system is developed. The results show that the proposed algorithm can obtain relatively good experimental results and can improve the accuracy of sentence similarity calculation by combining and optimizing the semantic features and sentence structure features of sentences.
【作者单位】：中国人民解放军后勤科学研究所;北京航空航天大学计算机学院;昆明理工大学信息工程与自动化学院;
【基金】：国家自然科学基金No.61370126 国家高技术研究发展计划(863计划)No.2015AA016004 国家社会科学基金No.15GJ003-154 软件开发环境国家重点实验室探索性自主研究课题基金No.SKLSDE-2015ZX-16~~
【分类号】：TP391.1

【相似文献】