对象检索中的实体信息查询扩展算法研究
发布时间:2019-07-10 08:34
【摘要】:本文主要研究了对象检索中的实体信息扩展算法,现如今对于信息的需求已经逐渐从较为模糊的网页检索演进为对象检索,带动实体信息抽取成为最核心的技术之一,而实体信息扩展则是实体信息抽取技术中一个重要的部分。实体信息抽取的目的在于自动生成包含实体相关属性信息的实体知识库。本文研究的实体信息查询扩展的目的:一是扩充实体查询词信息,在查询词信息不完备的条件下,对实体查询词进行信息扩充,消除查询词歧义,明确查询意图;二是实现针对实体别称等共指信息的扩展,从而将共同指向的不同实体之间的信息得以合并共享。 本文的主要工作如下: 首先,将对象检索与传统的信息检索进行了分析对比,重点分析了实体信息扩展和传统查询扩展在预处理、词项选择、相关度计算、及匹配方法上的区别和联系,并在此基础上确定了本文的主要研究课题,即基于统计学习的实体信息扩展,以及基于语法规则的实体信息扩展。 其次,针对与实体相关度高的词项扩展问题,本文提出了一种基于概率统计的实体信息扩展方法,利用相关反馈技术,结合层次聚类算法,在相关文档集内对实体与词项进行共现相关度挖掘,实现对实体描述信息的扩展。基于该模型,对两千余个实体进行了相关词项扩展,并应用在TREC2012Microblog评测任务中,结果验证了该模型的有效性。 最后,针对实体别称、同义词、身份描述等信息,本文研究给出了一种基于语法规则的实体信息扩展方法,通过词法分析预处理,根据针对共指表述的语法特征,对实体表述进行共指消解,实现实体别称等信息的扩展。利用该模型,在TAC2012KBP中的两个子任务中获得良好效果,验证了该模型的有效性。
文内图片:
图片说明:凝聚的层次聚类划分策略这一簇文档集中的全部文档将作为对实体的支撑信息/并在后续步骤中对这些文档进行针对这一实体的信息抽取作为对这一实体的信息扩展
[Abstract]:This paper mainly studies the entity information expansion algorithm in object retrieval. Now the demand for information has gradually evolved from vague web page retrieval to object retrieval, which makes entity information extraction become one of the most core technologies, and entity information expansion is an important part of entity information extraction technology. The purpose of entity information extraction is to automatically generate entity knowledge base containing entity related attribute information. The purpose of the entity information query extension studied in this paper is: first, to expand the entity query word information, under the condition that the query word information is not complete, to expand the entity query word information, to eliminate the query word ambiguity, and to clarify the query intention; the other is to realize the expansion of the common reference information for the entity nickname, so that the information between the different entities can be merged and shared. The main work of this paper is as follows: firstly, the object retrieval is analyzed and compared with the traditional information retrieval, and the differences and relations between entity information extension and traditional query extension in preprocessing, word item selection, relevance calculation and matching methods are analyzed. On this basis, the main research topics of this paper are determined, that is, the entity information extension based on statistical learning. And the extension of entity information based on syntax rules. Secondly, in order to solve the problem of word item expansion with high correlation with entity, this paper proposes a method of entity information extension based on probability statistics. By using correlation feedback technology and hierarchical clustering algorithm, the co-occurrence correlation degree mining of entity and word item is carried out in the related document set to realize the extension of entity description information. Based on the model, the related lexical items of more than two thousand entities are extended and applied to the TREC2012Microblog evaluation task. The results verify the effectiveness of the model. Finally, aiming at the information such as entity synonym, identity description and so on, this paper presents a method of entity information extension based on grammatical rules. Through lexical analysis preprocessing, according to the grammatical characteristics of common reference expression, the entity expression is digested and the information such as entity nickname is extended. Using the model, good results are obtained in two subtasks in TAC2012KBP, and the effectiveness of the model is verified.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.3
本文编号:2512481
文内图片:
图片说明:凝聚的层次聚类划分策略这一簇文档集中的全部文档将作为对实体的支撑信息/并在后续步骤中对这些文档进行针对这一实体的信息抽取作为对这一实体的信息扩展
[Abstract]:This paper mainly studies the entity information expansion algorithm in object retrieval. Now the demand for information has gradually evolved from vague web page retrieval to object retrieval, which makes entity information extraction become one of the most core technologies, and entity information expansion is an important part of entity information extraction technology. The purpose of entity information extraction is to automatically generate entity knowledge base containing entity related attribute information. The purpose of the entity information query extension studied in this paper is: first, to expand the entity query word information, under the condition that the query word information is not complete, to expand the entity query word information, to eliminate the query word ambiguity, and to clarify the query intention; the other is to realize the expansion of the common reference information for the entity nickname, so that the information between the different entities can be merged and shared. The main work of this paper is as follows: firstly, the object retrieval is analyzed and compared with the traditional information retrieval, and the differences and relations between entity information extension and traditional query extension in preprocessing, word item selection, relevance calculation and matching methods are analyzed. On this basis, the main research topics of this paper are determined, that is, the entity information extension based on statistical learning. And the extension of entity information based on syntax rules. Secondly, in order to solve the problem of word item expansion with high correlation with entity, this paper proposes a method of entity information extension based on probability statistics. By using correlation feedback technology and hierarchical clustering algorithm, the co-occurrence correlation degree mining of entity and word item is carried out in the related document set to realize the extension of entity description information. Based on the model, the related lexical items of more than two thousand entities are extended and applied to the TREC2012Microblog evaluation task. The results verify the effectiveness of the model. Finally, aiming at the information such as entity synonym, identity description and so on, this paper presents a method of entity information extension based on grammatical rules. Through lexical analysis preprocessing, according to the grammatical characteristics of common reference expression, the entity expression is digested and the information such as entity nickname is extended. Using the model, good results are obtained in two subtasks in TAC2012KBP, and the effectiveness of the model is verified.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.3
【参考文献】
相关期刊论文 前3条
1 徐建民;白彦霞;吴树芳;;基于同义词扩展的贝叶斯网络检索模型[J];计算机应用;2006年11期
2 严华云;刘其平;肖良军;;信息检索中的相关反馈技术综述[J];计算机应用研究;2009年01期
3 王兰成;李超;;结合两种相似度计算的主题信息检索方法研究[J];现代图书情报技术;2009年11期
,本文编号:2512481
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2512481.html