基于网络语义标签的多源知识库实体对齐算法
发布时间:2018-07-21 14:00
【摘要】:知识库是多种自然语言处理任务的重要数据资源,但单一知识库覆盖度低,不同知识库异构性强,不利于数据的共享和集成.因此,多源知识库融合技术的研究有着十分重要的意义.其中,多源知识库实体对齐是多源知识库融合技术中的重要组成部分.在语义万维网发展的推动下,国外开展了很多相关工作,大多适用于英文知识库,对于中文知识库的研究较少.出于对中文知识库融合的研究目的,该文提出了一种基于网络语义标签的多源知识库实体对齐算法.该算法综合利用属性标签、类别标签和非结构化文本关键词,对齐中文百科实体.经实验测试,该算法能够较好地解决多源知识库实体对齐问题,算法在近95%的准确率下,仍能保持近55%的较好的召回率,应用于实际系统中,满足了实际的多源知识库实体对齐应用需求.
[Abstract]:Knowledge base is an important data resource for many kinds of natural language processing tasks, but the coverage of single knowledge base is low and the heterogeneity of different knowledge bases is strong, which is not conducive to data sharing and integration. Therefore, the research of multi-source knowledge base fusion technology is of great significance. Among them, multi-source knowledge base entity alignment is an important part of multi-source knowledge base fusion technology. Driven by the development of semantic World wide Web, a lot of relevant work has been carried out abroad, most of which are suitable for English knowledge base, but there are few researches on Chinese knowledge base. For the purpose of research on Chinese knowledge base fusion, this paper proposes a multi-source knowledge base entity alignment algorithm based on web semantic label. The algorithm uses attribute tags, class labels and unstructured text keywords to align Chinese encyclopedia entities. Experimental results show that the algorithm can solve the problem of solid alignment of multi-source knowledge base well. The algorithm can still maintain a good recall rate of nearly 55% under the accuracy of 95%, and is applied to the actual system. It meets the needs of the practical multi-source knowledge base entity alignment application.
【作者单位】: 中国科学院自动化研究所模式识别国家重点实验室;
【基金】:国家自然科学基金项目(61533018) 国家“九七三”重点基础研究发展规划(2014CB340503) “CCF-腾讯”犀牛鸟基金资助~~
【分类号】:TP391.1
,
本文编号:2135758
[Abstract]:Knowledge base is an important data resource for many kinds of natural language processing tasks, but the coverage of single knowledge base is low and the heterogeneity of different knowledge bases is strong, which is not conducive to data sharing and integration. Therefore, the research of multi-source knowledge base fusion technology is of great significance. Among them, multi-source knowledge base entity alignment is an important part of multi-source knowledge base fusion technology. Driven by the development of semantic World wide Web, a lot of relevant work has been carried out abroad, most of which are suitable for English knowledge base, but there are few researches on Chinese knowledge base. For the purpose of research on Chinese knowledge base fusion, this paper proposes a multi-source knowledge base entity alignment algorithm based on web semantic label. The algorithm uses attribute tags, class labels and unstructured text keywords to align Chinese encyclopedia entities. Experimental results show that the algorithm can solve the problem of solid alignment of multi-source knowledge base well. The algorithm can still maintain a good recall rate of nearly 55% under the accuracy of 95%, and is applied to the actual system. It meets the needs of the practical multi-source knowledge base entity alignment application.
【作者单位】: 中国科学院自动化研究所模式识别国家重点实验室;
【基金】:国家自然科学基金项目(61533018) 国家“九七三”重点基础研究发展规划(2014CB340503) “CCF-腾讯”犀牛鸟基金资助~~
【分类号】:TP391.1
,
本文编号:2135758
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2135758.html