基于知识图谱构建人物关系的设计与实现

发布时间：2018-02-11 11:59

本文关键词： OrientDB 知识图谱本体　出处：《重庆大学》2016年硕士论文　论文类型：学位论文

【摘要】：公安情报工作的重点是关注人物、组织、账号之间的关系,在实际工作中往往需要通过一个姓名获得与之相关的所有信息,比如某人最近和哪些人联系过,某人参加过哪些活动,某人使用过哪些社交账号等,这些需求通常需要人工在海量信息中查找答案,于是论文提出构建一套描述了人物之间的关联关系的知识图谱,使得通过查询知识图谱中人物关系,就能获得人物的基本信息、人物相关活动轨迹信息、人物的相关人物信息等。当然,知识图谱应用在查询人物关系上为情报工作带来了便利,但是如何设计和构建知识图谱却是一个难点。然而,现存的许多研究工作都假设了原始数据已经清洗完毕,人物关系已经构建成为三元组数据,甚至知识图谱已经构建完毕,而主要研究知识图谱的分析方法和应用场景。于是,论文的主要工作集中在从原始数据到形成人物知识图谱的过程上,而对于人物知识图谱应用只需要满足查询人物关系的要求。对于设计构建人物关系的知识图谱,主要存在三个难点问题:一、原始的数据量非常大而且数据结构完全不一样,如何从中抽取到人物、组织、账号等关注的对象,以及如何判断两个人物存在关系。二、针对知识图谱的更新问题,如何判断新加入的人物是否已经存在于知识图谱中,而且如果对于已存在的人物又如何合并人物相关信息。三、人物关系包含了人与人、人与组织、人与网站、人与账号等上千类关系,如何设计每种对象的数据模型,既能描述对象基本信息,又能描述对象之间关系。本文的主要工作有:(1)在本体建模的基础上,提出了人物关系建模方法。首先根据域、类、属性、实体的定义,详细设计了这四类数据结构,并指导创建了人物属性集合、人物关系集合,并实际验证了该建模方案的可行性。(2)在自然语言分词技术基础上,提出了融合多正则表达式的人物实体抽取技术。通过实验比较了中科院分词和哈工大分词的中文分词效果,分析了两种分词技术的不同特点。同时,实验证明了结合多正则表达式可以提高实体抽取效果,特别适用于识别账号类实体。(3)提出了基于知识图谱的人物关系搜索、语义搜索、场景化搜索这三种应用方案,并对比了三种方案的应用场景。
[Abstract]:The focus of public security intelligence work is to focus on the relationships between people, organizations, and accounts. In practical work, it is often necessary to obtain all the relevant information through a name, such as who someone has recently contacted. As to what activities someone has participated in, what social accounts they have used, and so on, which needs to be manually searched for answers in a huge amount of information, the paper proposes to build a knowledge map that describes the relationships between people. It makes it possible to obtain the basic information of the characters, the information about the trajectory of the activities of the characters, the information about the characters, and so on by querying the relationships of the characters in the knowledge map. Of course, The application of knowledge atlas in querying relationships of people brings convenience to intelligence work, but how to design and construct knowledge atlas is a difficulty. However, many existing researches assume that the original data has been cleaned. The relationship between people has been constructed into triple data, even the knowledge map has been constructed, and the analysis methods and application scenarios of knowledge map are mainly studied. The main work of this paper is to focus on the process from the original data to the formation of the character knowledge map, but the application of the character knowledge map only needs to meet the requirements of querying the relationship between the people, and to design and construct the knowledge map of the person relationship, There are mainly three difficult problems: first, the original data volume is very large and the data structure is completely different, how to extract the objects of concern such as characters, organizations, accounts, and how to judge the relationship between the two characters. In view of the problem of updating the knowledge map, how to judge whether the newly added characters already exist in the knowledge map, and how to combine the relevant information about the existing characters. Third, the relationship of people includes people, people and organizations. How to design the data model of each object can describe the basic information of the object and the relationship between the objects. The main work of this paper is: 1) on the basis of ontology modeling, how to design the data model of each object, such as human and website, person and account number, etc. Firstly, according to the definition of domain, class, attribute and entity, four kinds of data structures are designed in detail. Finally, the feasibility of the modeling scheme is verified. (2) based on the natural language word segmentation technology, a character entity extraction technique combining multiple regular expressions is put forward, and the Chinese word segmentation effect of Chinese Academy of Sciences segmentation and Hart participle is compared through experiments. This paper analyzes the different characteristics of two word segmentation techniques. At the same time, the experiment proves that the combination of multiple regular expressions can improve the effect of entity extraction, especially for identifying account class entities. Scene-based search for these three application schemes, and compare the application scenarios of the three schemes.
【学位授予单位】：重庆大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：D035.3

【参考文献】