当前位置:主页 > 科技论文 > 软件论文 >

基于上下文的多特征图模型中文实体链接技术

发布时间:2018-08-25 15:21
【摘要】:网络信息的发展与语义搜索需求的不断增长,使得知识库的扩充成为自然语言处理研究领域的热点。实体链接正是知识库扩充的核心关键技术,是将文本中的实体指称表述项正确链接到知识库中实体的过程,具有重要的理论研究价值和实际应用价值。目前大多数实体链接技术处理的语言为英文,针对中文的研究仍处于起步阶段,造成这一现象的主要原因包括:(1)缺乏统一且权威的中文开源知识库和语料库;(2)中文的实体抽取技术受制于中文分词,并且中文的语义丰富、语法更加灵活,消歧难度比英文大,使得其仍然停留在命名实体的表述层面,不能很好地获取实体的语义信息。针对以上问题,本文以当前主流的英文实体链接技术为基础,结合目前中文的研究现状,提出了一种基于上下文的多特征图模型的解决方案。(1)选取中文维基百科作为此次实体链接任务的知识库支撑,并且从NIST(National Institute of Standards and Technology,美国国家标准与技术研究院)在TAC(Text Analysis Conference,文本分析会议)的KBP(Knowledge Base Population,知识库扩充)子任务提供的官方评测数据中,抽取中文语料信息,构造语料库和实验数据集;(2)从实体指称表述项的上下文和维基百科数据库两个方面入手,充分抽取实体之间的多种特征并量化为语义相似度,然后将语义相似度融合到构建的图模型中,利用图模型的主题一致性的特点,对候选实体进行排序,完成实体链接,达到提高中文分词的准确性和增加实体语义信息的目的。为了验证本文方法的性能,采用重现目前最新的中文实体链接的方法,实验结果表明,本文提出的方法可以有效提高实体链接的准确率和效率,取得了较好的整体效果。
[Abstract]:With the development of network information and the increasing demand of semantic search, the expansion of knowledge base has become a hot topic in the field of natural language processing. Entity link is the key technology of the expansion of knowledge base, and it is the process of correctly linking the entity reference in the text to the entity in the knowledge base. It has important theoretical research value and practical application value. At present, most of the languages processed by physical link technology are English, and the research on Chinese is still in its infancy. The main causes of this phenomenon include: (1) lack of unified and authoritative Chinese open source knowledge base and corpus; (2) Chinese entity extraction technology is restricted by Chinese word segmentation, and Chinese has rich semantics, more flexible grammar and greater difficulty in disambiguation than English. It still stays at the expression level of named entity, and can not get the semantic information of entity well. In view of the above problems, this paper based on the current mainstream English entity link technology, combined with the current research status of Chinese, A multi-feature graph model based on context is proposed. (1) Chinese Wikipedia is selected as the knowledge base support for this entity link task. And extract Chinese corpus information from the official evaluation data provided by the NIST (National Institute of Standards and Technology, National Institute of Standards and Technology (NIST (National Institute of Standards and Technology,) in the KBP (Knowledge Base Population, knowledge Base expansion of the TAC (Text Analysis Conference, text Analysis Conference. Construct corpus and experimental data set; (2) from the context of entity reference expression and Wikipedia database, fully extract a variety of features between entities and quantify them to semantic similarity. Then the semantic similarity is fused into the constructed graph model. By using the feature of topic consistency of the graph model, the candidate entities are sorted and the entity links are completed, so as to improve the accuracy of Chinese word segmentation and increase the semantic information of entities. In order to verify the performance of this method, the method of reproducing the latest Chinese entity link is adopted. The experimental results show that the proposed method can effectively improve the accuracy and efficiency of the entity link, and achieve a good overall effect.
【学位授予单位】:太原理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1

【参考文献】

相关期刊论文 前10条

1 杨光;刘秉权;刘铭;;基于图方法的命名实体消歧[J];智能计算机与应用;2015年05期

2 李茂林;;基于主题敏感的重启随机游走实体链接方法[J];北京大学学报(自然科学版);2016年01期

3 陈万礼;昝红英;吴泳钢;;基于多源知识和Ranking SVM的中文微博命名实体链接[J];中文信息学报;2015年05期

4 昝红英;吴泳钢;贾玉祥;牛桂玲;;基于多源知识的中文微博命名实体链接[J];山东大学学报(理学版);2015年07期

5 张涛;刘康;赵军;;一种基于图模型的维基概念相似度计算方法及其在实体链接系统中的应用[J];中文信息学报;2015年02期

6 舒佳根;惠浩添;钱龙华;朱巧明;;一个中文实体链接语料库的建设[J];北京大学学报(自然科学版);2015年02期

7 谭咏梅;杨雪;;结合实体链接与实体聚类的命名实体消歧[J];北京邮电大学学报;2014年05期

8 郭宇航;秦兵;刘挺;李生;;实体链指技术研究进展[J];智能计算机与应用;2014年05期

9 怀宝兴;宝腾飞;祝恒书;刘淇;;一种基于概率主题模型的命名实体链接方法[J];软件学报;2014年09期

10 朱敏;贾真;左玲;吴安峻;陈方正;柏玉;;中文微博实体链接研究[J];北京大学学报(自然科学版);2014年01期



本文编号:2203297

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2203297.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户2d50d***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com