基于维基百科的人物关系抽取研究
[Abstract]:In the research of information extraction, personal-relation extraction is an important research topic. The research on personal-relationship extraction originated from the evaluation project of MUC conference and was replaced by ACE conference. At present, most of the data used in the study of Chinese character relations are structured evaluation materials from ACE conferences or more standardized news materials such as People's Daily. However, in practical applications, especially in the Internet era, people are more and more used to retrieve information from the Internet, such as people, events and so on. Wikipedia is one of the commonly used search engines. Wikipedia is an open knowledge base, which contains a wealth of personal-relationship information. At the same time, it is also a knowledge base which accords with the semi-structured features of network text. Therefore, Wikipedia-based personal-relationship extraction is more similar to real-life personal-relationship extraction. The main idea of personal-relationship extraction is to transform it into personal-relationship classification. The traditional extraction methods are mainly based on knowledge base, machine learning and pattern matching. The methods based on machine learning are mainly classified based on kernel and feature vector. In the process of personal-relationship extraction, the two main difficulties are human name recognition and personal-relationship recognition. In view of the above difficulties, this paper puts forward the corresponding solutions, which have the following innovations: (1) in order to solve the problem that the foreign transliteration recognition rate of the existing word segmentation tools in the human name recognition is not high, In this paper, we use the method of extracting the information box data from Wikipedia, and construct the Chinese character database based on Wikipedia. At the same time, a dictionary of transliteration names of foreign languages based on Chinese Wikipedia is constructed by using the Link data in Wikipedia. (2) in this paper, a hierarchical classification method based on pattern matching and feature vector method is proposed to classify people relationship, and DAG-SVMs multi-value classification method is used to solve the problem of multi-valued classification. In order to improve the execution speed and performance of the classification model, and to introduce the personal relationship into the division of the relationship between people, the phenomenon of "the same person does not have the same name" in Wikipedia can be alleviated. The feasibility of this method is verified by experiments. This paper uses the proposed method to construct a large Wikipedia database of characters and a dictionary of names. At the same time, the experimental results show that the performance of this paper is better in the recognition of personal relationship, especially in the classification of personal relationship and family relationship.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【相似文献】
相关期刊论文 前3条
1 傅宛菊;陈木兰;;中国新魔幻电影的类型化初探[J];东南传播;2014年08期
2 丁海峰;;论电影《海洋天堂》中细节的运用[J];西部广播电视;2013年05期
3 ;[J];;年期
相关会议论文 前3条
1 白劲鹏;;可怕的对称——论《了不起的盖茨比》中的主次人物关系[A];外语语言教学研究——黑龙江省外国语学会第十一次学术年会论文集[C];1997年
2 黄素影;;《天伦》创作小结[A];我的角色与我们的剧团——第六届电影表演艺术学会奖文集[C];1997年
3 吴士余;;重视人物关系的典型化[A];《毛泽东文艺思想研究》第三辑暨全国毛泽东文艺思想研究会第三次年会论文集[C];1983年
相关重要报纸文章 前6条
1 本报记者 张悦;音乐剧《蝶》推出修排版[N];中国艺术报;2008年
2 记者 金朝力;网络视频业首推人脸识别功能[N];北京商报;2010年
3 本文实习记者 张柳青;纪念汶川地震一周年[N];中国电影报;2009年
4 许柏林;小成本拍出大境界[N];人民日报;2012年
5 张克丹 综合整理;青春·理想·奋斗·奉献[N];中国电影报;2009年
6 上海戏剧学院副教授 石俊;问号的力量[N];文汇报;2012年
相关硕士学位论文 前7条
1 唐丞博;谈《追梦时刻》中人物关系的发展和变化[D];云南艺术学院;2016年
2 刘博佳;基于维基百科的人物关系抽取研究[D];北京交通大学;2016年
3 潘云;基于中文在线资源的人物关系抽取研究[D];华东师范大学;2015年
4 徐珊;孙昌涉初期小说的人物关系和作家意识研究[D];山东大学;2009年
5 顾静航;基于信息抽取的人物关系网络构建研究[D];苏州大学;2014年
6 许婷;基于话单挖掘的可视化人物关系分析系统的设计与实现[D];哈尔滨工业大学;2014年
7 范少帅;基于特征向量的人物关系抽取方法研究[D];华东交通大学;2015年
,本文编号:2310833
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2310833.html