地名本体实体与关系抽取研究
发布时间:2018-08-01 13:54
【摘要】:近年来,突发事件频频发生。应急管理的重要性越来越突出。应急管理的过程中涉及多方面数据的融合。如何快速、准确的提供相关的数据是急需研究的问题。随着互联网的发展,网络上的数据呈指数级增长,这些数据中包含了很多应急管理需要的信息。地名信息是应急信息的核心支撑点。本文进行地名本体实体和关系抽取研究,抽取地名相关的实体和实体间的关系,为应急数据的抽取和语义化奠定核心基础。 实体和关系的抽取属于自然语言处理中的命名实体识别和关系抽取。目前主流的方法有基于规则的方法和基于机器学习的方法。本文在抽取的过程中根据原始文本中实体和关系的特点分别因地制宜地采取了基于规则和基于机器学习的方法。 由于业界没有建立好的地名领域抽取的语料库,本文首先建立了地名本体抽取的实体体系和关系体系,然后根据抽取过程中关注的特征建立实体抽取和关系抽取所需要的语料,详细介绍了语料库构建的过程。对地名本体实体根据其在原始文本中出现的规律进行了分类,分别采用基于规则的方法和利用最大熵进行机器学习的方法。首先总结了四类地名本体实体的抽取规则,然后对于其他的几类地名本体实体,首先对机器学习过程中使用的特征进行了分析,基于标注的语料,利用最大熵进行了地名实体的抽取。对于关系的抽取,首先分析了关系的特点,采用基于特征向量的方法,利用SVM进行关系的抽取。根据语料的特点,提出了基于规则的方法抽取地名本体的关系。同时,分析了关系的特点,制定了相关的规则,从已有的关系出发,推导出隐含的关系,进一步丰富地名本体关系库。 最后,设计和实现了地名本体实体和关系抽取平台,并将抽取的数据应用到了实际的语义地名搜索引擎中,实践证明,抽取的实体和关系数据很大程度上提升了用户体验,帮助了用户更方便、更迅速、更准确的地名相关数据。
[Abstract]:In recent years, emergencies occur frequently. The importance of emergency management is becoming more and more prominent. The process of emergency management involves the fusion of many aspects of data. How to provide relevant data quickly and accurately is an urgent problem. With the development of the Internet, the data on the network increase exponentially, which contains a lot of information needed for emergency management. Toponymic information is the core support of emergency information. In this paper, the ontology and relation extraction of geographical names is carried out to extract the relationship between entities and entities, which lays the core foundation for the extraction and semantics of emergency data. The extraction of entities and relationships belongs to named entity identification and relation extraction in natural language processing. At present, the mainstream methods are rule-based approach and machine-based learning method. According to the characteristics of entities and relationships in the original text, this paper adopts rule-based and machine-learning methods in the process of extraction, respectively. Because there is no good corpus for toponymic domain extraction, this paper first establishes the entity system and relational system of toponymic ontology extraction, and then establishes the corpus needed for entity extraction and relational extraction according to the features concerned in the extraction process. The construction process of corpus is introduced in detail. The ontology entities of geographical names are classified according to their rules in the original text, respectively, which are based on rules and machine learning methods using maximum entropy. Firstly, the extraction rules of four kinds of toponymic ontology entities are summarized, then the features used in the machine learning process are analyzed for several other toponymic ontology entities, which are based on annotated corpus. The maximum entropy is used to extract geographical names. For the extraction of relationships, the characteristics of the relationships are analyzed, and the feature vector based method is used to extract the relationships using SVM. According to the characteristics of corpus, a rule-based method is proposed to extract the relation of geographical names ontology. At the same time, the characteristics of the relationship are analyzed, and the relevant rules are made. Based on the existing relations, the implicit relationship is derived, which further enriches the ontology relation database of geographical names. Finally, the ontology entity and relational extraction platform are designed and implemented, and the extracted data are applied to the actual semantic toponymic search engine. The practice shows that the extracted entity and relational data greatly improve the user experience. Help users to more convenient, faster, more accurate place name related data.
【学位授予单位】:天津大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.1
本文编号:2157790
[Abstract]:In recent years, emergencies occur frequently. The importance of emergency management is becoming more and more prominent. The process of emergency management involves the fusion of many aspects of data. How to provide relevant data quickly and accurately is an urgent problem. With the development of the Internet, the data on the network increase exponentially, which contains a lot of information needed for emergency management. Toponymic information is the core support of emergency information. In this paper, the ontology and relation extraction of geographical names is carried out to extract the relationship between entities and entities, which lays the core foundation for the extraction and semantics of emergency data. The extraction of entities and relationships belongs to named entity identification and relation extraction in natural language processing. At present, the mainstream methods are rule-based approach and machine-based learning method. According to the characteristics of entities and relationships in the original text, this paper adopts rule-based and machine-learning methods in the process of extraction, respectively. Because there is no good corpus for toponymic domain extraction, this paper first establishes the entity system and relational system of toponymic ontology extraction, and then establishes the corpus needed for entity extraction and relational extraction according to the features concerned in the extraction process. The construction process of corpus is introduced in detail. The ontology entities of geographical names are classified according to their rules in the original text, respectively, which are based on rules and machine learning methods using maximum entropy. Firstly, the extraction rules of four kinds of toponymic ontology entities are summarized, then the features used in the machine learning process are analyzed for several other toponymic ontology entities, which are based on annotated corpus. The maximum entropy is used to extract geographical names. For the extraction of relationships, the characteristics of the relationships are analyzed, and the feature vector based method is used to extract the relationships using SVM. According to the characteristics of corpus, a rule-based method is proposed to extract the relation of geographical names ontology. At the same time, the characteristics of the relationship are analyzed, and the relevant rules are made. Based on the existing relations, the implicit relationship is derived, which further enriches the ontology relation database of geographical names. Finally, the ontology entity and relational extraction platform are designed and implemented, and the extracted data are applied to the actual semantic toponymic search engine. The practice shows that the extracted entity and relational data greatly improve the user experience. Help users to more convenient, faster, more accurate place name related data.
【学位授予单位】:天津大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.1
【参考文献】
相关期刊论文 前5条
1 周俊生;戴新宇;尹存燕;陈家骏;;基于层叠条件随机场模型的中文机构名自动识别[J];电子学报;2006年05期
2 刘克彬;李芳;刘磊;韩颖;;基于核函数中文关系自动抽取系统的实现[J];计算机研究与发展;2007年08期
3 蒋方玲;王文俊;杨鹏;徐佳佳;;中文地名本体模型研究[J];计算机工程与应用;2011年25期
4 王宁,葛瑞芳,苑春法,黄锦辉,李文捷;中文金融新闻中公司名的识别[J];中文信息学报;2002年02期
5 董静;孙乐;冯元勇;黄瑞红;;中文实体关系抽取中的特征选择研究[J];中文信息学报;2007年04期
相关硕士学位论文 前1条
1 张志田;无监督关系抽取方法研究[D];哈尔滨工业大学;2007年
,本文编号:2157790
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2157790.html