基于维基百科的实体链接算法研究及系统实现
[Abstract]:The Internet enters the information explosion age, the information quantity is huge, the manifestation is diverse, the information is complex. How to get the information that users need from a large amount of information is an urgent problem to be solved. However, there is widespread ambiguity in natural languages. Entity ambiguity refers to the linguistic phenomenon in which the same entity refers to different real world entities in different contexts. Disambiguation of entities can help to better understand text information, and entity links are the right links to the corresponding entities in the knowledge base by linking pages, Weibo or the names of people, places and institutions in the dialogue. To solve the problem of entity disambiguation of synonym and polysemy, it is of great significance for information retrieval, automatic question and answer and complete knowledge base. Aiming at the core problem of entity link, the candidate entity ranking of entity reference is studied in this paper. The main work and innovation of this paper are summarized as follows: 1. A candidate entity ranking algorithm combining LDA and restarting random walk and a candidate entity ranking algorithm combining Word2Vec and PageRank are proposed to effectively improve the accuracy of entity link. The traditional candidate entity ranking algorithm often stays at the stage of feature extraction, and needs to extract a large number of features, and then training by supervised learning is very cumbersome, and its features are often some shallow features, such as the similarity of strings. Ignoring the semantic similarity between entities, this paper uses the link structure in entity Wikipedia, considering that entities under the same subject will link together, and entities that are more semantically relevant will be linked together. In order to solve this problem, this paper proposes a candidate entity ranking algorithm that combines LDA and reboot random walk, and a candidate entity ranking algorithm that combines Word2Vec and PageRank. Both algorithms utilize the graph structure of Wikipedia where the entity is located. The reboot random walk results in the vector of each candidate entity, and the PR value of each candidate entity is obtained by PageRank. The former incorporates the feature vector of the entity on the subject, and the latter integrates the semantic similarity between the entity and the entity. Both of them add semantic features to the graph model. The experimental results show that compared with the mainstream candidate entity ranking algorithm, the accuracy of entity link is improved. 2. Combined with two candidate entity ranking algorithms, an entity link system (LEL,) is developed. The system can link the entities in the text to the Wikipedia knowledge base and has strong interaction.
【学位授予单位】:华东师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【相似文献】
相关期刊论文 前10条
1 陈斌;;结构化实体图——E-R方法的增强[J];计算机科学;1986年06期
2 庞正刚;;在Auto CAD中绘制相交线的新方法[J];重庆工贸职业技术学院学报;2006年02期
3 李灶福,李晓兰,邓小红,包晨阳;关于Auto CAD中将三维实体图转换成平面三视图的探讨[J];机床与液压;2003年03期
4 荣英;谭国萍;;CAD快速绘制组合体三维实体图的方法和技巧[J];九江学院学报(自然科学版);2013年03期
5 J Miguel Gerlso;张勤勇;;TM——一适合CAD和所要求的数据库功能的面向实体语言[J];国外导弹与航天运载器;1989年08期
6 焦泉忠;;NX5实体图与CAXA2007工程图转换[J];金属加工(冷加工);2013年02期
7 范力军;图形变量化的实现技术[J];工程设计CAD与智能建筑;1999年11期
8 王斌;;CAD三维实体解决复杂形体看图问题[J];实验室科学;2007年03期
9 杨长青;;AutoCAD三维实体教学体会[J];科技信息;2010年32期
10 徐景辉;苑伟政;常洪龙;谢建兵;;一种新型三维实体到标准工艺版图的转换方法[J];传感技术学报;2006年05期
相关博士学位论文 前1条
1 吴建华;矢量空间数据实体匹配方法与应用研究[D];武汉大学;2008年
相关硕士学位论文 前5条
1 薛昊原;领域文本资源实体链接算法研究[D];郑州大学;2015年
2 朱灿;实体解析技术研究与应用[D];上海交通大学;2015年
3 罗念;基于维基百科的实体链接算法研究及系统实现[D];华东师范大学;2016年
4 何峰权;基于属性模式的实体识别框架[D];哈尔滨工业大学;2013年
5 王玮;从可比语料中抽取等价实体翻译对的研究[D];哈尔滨工业大学;2014年
,本文编号:2271020
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2271020.html