词语相似度计算及其在问答系统中的应用研究

发布时间:2018-01-19 06:26

  本文关键词: HowNet 义原向量 PageRank 词语相似度 问答系统 出处:《郑州大学》2017年硕士论文 论文类型:学位论文


【摘要】:随着大数据时代的到来,互联网每天都会产生大量的文本信息,词语作为文本的基本组成单位,对词语语义的理解是文本处理的基础工作之一。词语相似度计算,是用一个具体的数值表示两个词语的相似程度,是理解词语语义的主要方法,词语相似度问题的解决将会推动自然语言处理领域相关应用技术的发展,例如问答系统、信息检索、词义消歧和机器翻译等。在深入学习了词语相似度计算及问答系统的相关研究方法的基础上,本文提出了一种基于义原向量的词语相似度计算方法,并研究了该方法在知识库问答系统中的应用,主要研究内容包括:(1)提出了一种义原向量生成模型SIC_PageRank。在HowNet义原上下位关系构成的层次结构图中,通过义原及其子孙义原节点的深度信息计算每个义原的信息容量SIC(Sememe Information Content),利用SIC和义原结构图中的连接关系,构建PageRank转移概率矩阵,基于PageRank思想迭代生成每个义原的向量表示。(2)提出了基于义原向量的词语相似度计算方法。基于SIC_PageRank模型生成义原向量,由义原向量的余弦相似度计算义原相似度,由义原相似度计算概念相似度,进而计算词语相似度。将基于义原向量的词语相似度计算方法应用到“现代汉语语义词典”名词语义类自动识别中,实验结果表明,该方法与人工校对结果的一致率达到71.9%,优于使用基于最短路径距离的方法。(3)词语相似度计算在知识库问答系统中的应用。在知识库问答系统中,借助词语相似度来计算问句谓词和候选答案谓词之间的相似度,并结合两者的编辑距离、词共现和分类等特征,使用排序学习算法Ranking SVM对候选答案排序。本文在NLPCC2016知识库问答系统评测任务数据集上进行实验,实验结果表明,将基于义原向量的词语相似度计算方法应用到知识库问答系统中,识别答案的精确率达到73.88%,召回率达到82.29%,平均F1值达到75.88%,在三个评价指标上均高于使用word2vec词向量的方法。
[Abstract]:With the advent of the era of big data, the Internet every day will produce a large amount of text information, words as the basic unit of the text, the semantic understanding is the basis of text processing. Word similarity computation, is used for a specific numerical similarity of two words, is the main method of semantic understanding to solve the problem, the development of word similarity will push Natural Language Processing related application technology field, such as question answering system, information retrieval, word sense disambiguation and Machine Translation. Based on related research methods in-depth study of word similarity calculation and question answering system, this paper presents a method of word similarity computation based on sememe vector, and studies the method for question answering system based on knowledge base, the main research contents include: (1) proposed a sememe vector generation model SIC_Page Figure Rank. in the HowNet hierarchy of sememe hyponymy constitutes, through the depth information of original meaning and original meaning of the children nodes is calculated for each original meaning of information capacity of SIC (Sememe Information Content), the connection between the SIC and the original meaning of the structure chart, construct the PageRank transtion-probablity matrix, PageRank said the idea of iterative generation each based on the original meaning of the vector. (2) proposed the original meaning of word similarity calculation based on vector method. SIC_PageRank model generation based on vector sememe sememe similarity, calculated by cosine similarity sememe vector, by sememe similarity calculation of concept similarity, and word similarity. The word similarity based on sememe vector the calculation method is applied to the "modern Chinese semantic dictionary" noun semantic classes in automatic recognition, the experimental results show that the consistent rate of this method and manual correction results reached 71.9%, better than that With the method of the shortest path distance. (3) based on word similarity computation Q & a system based on knowledge base. In the knowledge base of question answering system, using word similarity to calculate the similarity between question and answer candidate predicate predicate, and combined with the edit distance of the two word co-occurrence and classification, the use of learning to rank Ranking algorithm SVM to rank candidate answers. This article in the NLPCC2016 knowledge base quiz experiment systematically evaluation task data sets. Experimental results show that the original meaning of word similarity calculation based on vector method is applied to the knowledge base of question answering system, accurate answer recognition rate reached 73.88%, the recall rate reached 82.29%, the average value of F1 reached 75.88%. Methods using word2vec term vectors were higher than that in the three evaluation indexes.

【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【相似文献】

相关期刊论文 前10条

1 王树西;赵星秋;潘硕;;问答系统在教学中的应用[J];中国教育信息化;2007年07期

2 毛先领;李晓明;;问答系统研究综述[J];计算机科学与探索;2012年03期

3 莫丽萍,王树西,姜吉发,雷雨霞;问答系统和浅层结构模式推理[J];广西师范大学学报(自然科学版);2004年01期

4 卢志坚,张冬茉;中文问答系统中的问句理解[J];计算机工程;2004年18期

5 王树西;问答系统:核心技术、发展趋势[J];计算机工程与应用;2005年18期

6 林晓庆;;问答系统中基于列表类问题的研究[J];电脑知识与技术(学术交流);2007年07期

7 张积宾;徐志明;王恒;潘启树;;面向大规模网络数据的社会化问答系统[J];哈尔滨工业大学学报;2008年12期

8 贾君枝;毛海飞;;汉语框架网络问答系统问句处理研究[J];图书情报工作;2008年10期

9 胡小华;刘轩;刘丹;陆伟;;基于冗余的仿真问答系统的轻量级局部文本分析[J];图书情报知识;2009年01期

10 张中峰;李秋丹;;社区问答系统研究综述[J];计算机科学;2010年11期

相关会议论文 前10条

1 何靖;陈,

本文编号:1443053


资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1443053.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户23e65***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com