汉越双语语料库建设及事件图抽取方法研究
本文关键词:汉越双语语料库建设及事件图抽取方法研究 出处:《昆明理工大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 越南语 事件抽取 事件元素抽取 共指关系抽取 事件图
【摘要】:新闻中的事件抽取是信息抽取的重要研究任务之一,其主要目标是抽取出文本中蕴含的事件。尤其是越南语新闻的信息抽取,对处理好与越南的国际关系对区域经济发展、政治稳定有重要作用。一般来说,一篇新闻是由新闻文本中的多个事件组成的。在人们从新闻获取信息的过程中,人们除了获取新闻描述的多个子事件之外,还需要获取到这些事件之间的关联关系。这些关联关系同样是新闻的重要信息。因此,如何借助事件抽取来获得事件及事件间的关联关系显得至关重要。本文针对汉越双语新闻事件抽取这一问题,围绕汉越双语新闻语料构建、汉越事件抽取、汉越双语事件图构建等问题展开深入研究,完成了以下特色研究工作:(1)构建了汉越双语新闻语料库。针对汉越新闻分析及事件抽取的需求,定义了语料标注的内容,包括事件描述,事件要素,事件时间关系、事件共指关系及跨语言事件对齐关系等要素。收集了 508篇汉越双语新闻,采用XML语言进行了语料标注。为接下来的汉越双语事件抽取及汉越双语事件图构建提供重要支撑。(2)实现了基于机器学习和规则相结合的事件抽取方法。首先,选择词和词性、上下文的词及词性、语义特征等特征,并将汉语事件识别结果作为指导特征融入越南语事件识别中,采用支持向量机训练事件识别模型,识别事件触发词。然后,根据汉语及越南语的语法句法规律,定义不同语法结构的事件元素抽取规则,根据规则匹配抽取事件元素。最后,定义事件元素类型消解规则,通过规则匹配实现事件元素类型消解。对不符合事件元素类型消解规则的事件元素,通过与事件类型的词义集进行相似度计算来实现事件元素类型消解。实验结果表明提出的方法成功的提高了越南语事件抽取的效果。(3)提出了基于事件及事件间关联关系的双语事件图构建方法。首先,利用支持向量机模型抽取事件之间的共指关系及时间关系。然后,以事件为节点,以事件间的关联关系作为边,构建融合事件共指关系及时间关系的汉越双语事件图。最后,借鉴PageRank算法思想求解有向图中节点的权重,实现对汉越双语事件排序。实现双语事件图构建表征汉越新闻。(4)利用上述研究成果,设计了汉越双语新闻事件图抽取原型系统。实现汉越双语事件图抽取。
[Abstract]:Event extraction in news is one of the important research tasks of information extraction. Its main goal is to extract the events contained in the text, especially the information extraction of Vietnamese news. It plays an important role in regional economic development and political stability in dealing with the international relations with Vietnam. Generally speaking, a news article is composed of many events in a news text, and in the process of people getting information from news. In addition to obtaining multiple sub-events of news description, people also need to obtain the relationships between these events. These relationships are also important information of news. It is very important to obtain the relationship between events and events by means of event extraction. This paper focuses on the construction of Chinese-Vietnamese bilingual news corpus and the extraction of Sino-Vietnamese events in view of the problem of Chinese-Vietnamese bilingual news event extraction. The construction of Chinese-Vietnamese bilingual event map has been deeply studied, and the following research work has been completed: 1) the Chinese-Vietnamese bilingual news corpus has been constructed to meet the needs of Chinese-Vietnamese news analysis and event extraction. The contents of corpus tagging are defined, including event description, event elements, event time relationship, event co-referential relation and cross-language event alignment relationship. 508 Chinese-Vietnamese bilingual news articles are collected. XML language is used to annotate the corpus, which provides important support for the next Chinese-Vietnamese bilingual event extraction and Chinese-Vietnamese bilingual event map construction. An event extraction method based on the combination of machine learning and rules is implemented. First of all. The features of words and parts of speech, words and parts of speech of context, semantic features are selected, and the results of Chinese event recognition are integrated into Vietnamese event recognition. Support vector machine (SVM) is used to train event recognition model. Then, according to the syntax rules of Chinese and Vietnamese, the extraction rules of event elements with different syntactic structures are defined, and the event elements are extracted according to the matching rules. Define event element type resolution rules and implement event element type resolution by rule matching. For event elements that do not conform to event element type resolution rules. The result of experiment shows that the proposed method can improve the effect of Vietnamese event extraction successfully by calculating the similarity with the semantic set of the event type to achieve the resolution of the event element type. A bilingual event graph construction method based on event and event correlation relationship is proposed. First of all. The support vector machine (SVM) model is used to extract the co-referential relation and time relationship between events, and then, the event is taken as the node and the correlation relationship between events is taken as the edge. Construct the Chinese-Vietnamese bilingual event graph combining event coreference relation and time relationship. Finally, use the idea of PageRank algorithm to solve the weight of nodes in directed graph. To achieve the ranking of Sino-Vietnamese bilingual events. To achieve bilingual event map construction representation of Sino-Vietnamese news. 4) to use the above research results. A prototype system of Chinese-Vietnamese bilingual news event map extraction is designed, and the Chinese and Vietnamese bilingual event map extraction system is implemented.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 周晶晶;周枫;严馨;;基于依存树的越南语新闻事件元素抽取[J];计算机工程与设计;2016年08期
2 李发杰;余正涛;郭剑毅;李英;周兰江;;借助汉-越双语词对齐语料构建越南语依存树库[J];中文信息学报;2015年06期
3 徐霞;李培峰;朱巧明;;半监督中文事件抽取中的模板过滤和转换方法[J];计算机科学;2015年02期
4 徐霞;李培峰;郑新;朱巧明;;面向半监督中文事件抽取的事件推理方法[J];山东大学学报(理学版);2014年12期
5 赵丹;;SVM核函数与选择算法[J];数字技术与应用;2014年09期
6 孟光胜;赵志宇;;基于两层主动学习策略的SVM分类方法[J];河南师范大学学报(自然科学版);2014年02期
7 王健;吴雨;林鸿飞;杨志豪;;基于深层句法分析的生物事件触发词抽取[J];计算机工程;2014年01期
8 杨尔弘;曾青青;李婷婷;;事件信息结构分析[J];中文信息学报;2012年03期
9 王伟;赵东岩;;中文新闻事件本体建模与自动扩充[J];计算机工程与科学;2012年04期
10 赵江江;秦兵;;基于BootStrapping的中文事件元素抽取系统设计与实现[J];智能计算机与应用;2012年01期
相关会议论文 前1条
1 周强;王俊俊;陈丽欧;;构建大规模的汉语事件知识库[A];中国计算语言学研究前沿进展(2009-2011)[C];2011年
相关博士学位论文 前1条
1 谭红叶;中文事件抽取关键技术研究[D];哈尔滨工业大学;2008年
相关硕士学位论文 前3条
1 黄媛;中文事件论元抽取研究[D];苏州大学;2014年
2 潘清清;越南语新闻事件元素抽取方法研究[D];昆明理工大学;2014年
3 赵妍妍;中文事件抽取的相关技术研究[D];哈尔滨工业大学;2007年
,本文编号:1430378
本文链接:https://www.wllwen.com/jingjilunwen/quyujingjilunwen/1430378.html