一种面向中文网络百科非结构化信息的知识获取方法

发布时间：2018-01-01 06:27

本文关键词：一种面向中文网络百科非结构化信息的知识获取方法　出处：《图书情报工作》2016年13期 　论文类型：期刊论文

【摘要】：[目的 /意义]在进行大规模知识库构建时,基于手工方式的构建模式效率较低并且可行性较差,因此,从网络百科中自动地获取海量知识已经被越来越多的学者所关注。目前的研究主要关注于从英文网络百科数据源进行海量知识的抽取,而面向中文百科数据源进行的知识抽取研究工作尚处于起步阶段。[方法 /过程]为解决中文大规模知识库的构建问题,提出一种新的基于中文网络百科架构的大规模知识库的自动化构建方法:在第一阶段,对知识三元组中的主语和宾语之间的语义关系进行自扩展学习;在第二阶段,基于条件随机场和支持向量机协同分类器,对标注出的属性和属性值实体之间的语义关系进行预测。[结果 /结论]实验评测结果表明,该方法较前人工作在典型中文百科分类页面中的实体识别查准率和查全率分别最高有约10%和6%的提升。
[Abstract]:[Objective / significance] in the large-scale knowledge base construction, the manual mode is less efficient and less feasible. The automatic acquisition of mass knowledge from online encyclopedia has been paid more and more attention by more and more scholars. The current research mainly focuses on extracting mass knowledge from English online encyclopedia data sources. However, the research of knowledge extraction for Chinese encyclopedia data sources is still in its infancy. [Methods / process] in order to solve the problem of constructing Chinese large-scale knowledge base, a new automatic construction method of large-scale knowledge base based on Chinese network encyclopedia architecture is proposed: in the first stage. The semantic relation between subject and object in knowledge triple is self-expanded. In the second stage, based on the conditional random field and support vector machine (SVM) cooperative classifier, the semantic relationship between the tagged attributes and the attribute-valued entities is predicted. [Results / conclusion] the experimental results show that the precision and recall rate of entity identification in the typical Chinese encyclopedia classification page are improved by 10% and 6% respectively.
【作者单位】：首都经济贸易大学信息学院;
【基金】：首都经济贸易大学科研项目“中文链接数据构建关键技术研究”(项目编号:00791654490223) 北京市社会科学基金项目“微媒体对北京大学生行为模式变化影响的研究”(项目编号:15ZHB011)研究成果之一
【分类号】：TP391.1
【正文快照】： 1 引言为了实现“数据之网”(Web of data)的美好愿景,以便使发布在互联网上的语义信息可以为机器所理解,科研人员已做了大量的工作并已在Web上构建和发布了众多不同领域的语义数据集(semantic data sets)[1]。与此同时,在由T.Berners-Lee倡导并发起的关联数据(linked open

【相似文献】