中文文本中事件时空与属性信息解析方法研究

发布时间：2018-07-22 16:10

【摘要】：本文依托国家“863”课题“泛在空间信息关联更新与面向主题时空信息挖掘研究”,较为系统地探索中文文本中事件时空与属性信息解析方法,为泛在空间信息动态关联更新,全球统一时空框架下的空间信息与知识服务提供数据和技术支持,同时为事件时空模式挖掘奠定数据基础,进而为事件风险评估、公共安全等重大问题提供决策服务。本文针对中文文本中事件时空与属性信息描述的非结构化、定性化和不确定性等特点,围绕“文本描述-规范化表达-结构化抽取-可视化重构”的技术主线,重点研究事件时空与属性信息解析方法。主要研究内容与结论包括以下几个方面： (1)事件时空与属性信息的结构化表达：通过归纳总结中文文本中事件时空与属性信息描述的语言特征和语义结构,设计了事件时空与属性信息的知识表达框架和标注体系；以突发公共事件为例,以网络文本为数据源,基于GATE平台构建了中文文本中事件时空与属性信息标注语料库,为事件时空与属性信息抽取研究提供了标准化训练和测试数据。 (2)事件时空与属性信息抽取：分析中文文本中时间信息描述的规律性,实现了基于触发词和规则模型结合的时间信息抽取、推理和规范化解析,准确率、召回率和F值分别达到75.00%、88.24%和40.54%；利用条件随机场模型和规则模型,实现了事件名称识别和空间位置(包括地名和空间关系)信息抽取,其中事件名称识别准确率、召回率和F值分别为82.08%、80.18%和81.12%；设计了基于Bootstrapping的事件属性信息抽取算法,量词性的属性信息抽取准确率和召回率达到80.80%和85.16%。 (3)时空驱动的事件分类方法：通过分析事件时空认知和表达特性,提出一种融合时间、空间、属性、事件名称、触发词汇等多种上下文语义和语境信息的事件分类方法。按照句子、段落、篇章三个语言单元等级,探讨了事件替代性名称的推理方法。实验结果表明,事件分类准确率在封闭和开放测试中分别达到92.30%和80.60%。 (4)事件时空信息匹配与可视化：以地名数据库为空间数据源,提出了定性时空信息(地名、空间关系和时间信息)的匹配和可视化表达方法,探索了基于“时间-空间-概念类型”多重一致性约束的主题事件判断和时空过程重构方法,实现了事件信息在时空信息系统中有机的、直观的可视化表达,并对事件时空信息分布模式进行了聚类分析。研究结果表明,采用规则模型和统计模型结合的方式可以有效实现中文文本中事件时空与属性信息抽取,但是特征项的设置在统计模型的学习过程中起到举足轻重的作用；不同类型事件的时间、地名、空间关系、事件名称和类型等信息抽取模型具有通用性和可移植性,而属性信息存在较大差异,需要针对具体类型事件构建相应知识库和学习模型；事件类型判断存在灵活、复杂、语义模糊、不确定性特点,且属于多标记分类,融合词性、触发词汇、时间、空间、属性和事件名称等多种上下文语义和语境信息,可以有效提高事件分类效果；空间数据的覆盖范围和数据质量,以及空间关系解析模型,对事件时空与属性信息匹配、时空过程重构性能具有较大的影响。
[Abstract]:Based on the national "863" topic "Research on Spatial Information Association updating and topic oriented spatio-temporal information mining", this paper systematically explores the analysis method of event space-time and attribute information in Chinese text, providing data and data for spatial information and knowledge services under the global unified space-time framework. Technology support provides a data base for event spatio-temporal pattern mining, and then provides decision-making services for event risk assessment, public security and other major issues. This paper focuses on the unstructured, qualitative and uncertain features of event spatiotemporal and attribute information described in Chinese text, and the text description - normalized expression - structured extraction The main line of research is to take visual refactoring, focusing on temporal spatial and attribute information analysis methods. The main research contents and conclusions include the following aspects:
(1) structured expression of event space-time and attribute information: by summarizing the linguistic features and semantic structures of event spatiotemporal and attribute information described in Chinese text, the knowledge expression framework and tagging system of event space-time and attribute information are designed, and a public event is taken as an example, the network text is the data source, and the GATE platform is constructed. The corpus of event temporal and attribute information in Chinese text is built, which provides standardized training and test data for the research of event space time and attribute information extraction.
(2) event spatiotemporal and attribute information extraction: analyzing the regularity of time information description in Chinese text, realizing time information extraction based on the combination of trigger word and rule model, reasoning and normalization analysis, accuracy rate, recall rate and F value reached 75%, 88.24% and 40.54% respectively, and realized the use of conditional random field model and rule model. Event name recognition and spatial location (including place name and spatial relationship) information extraction, of which event name recognition accuracy, recall rate and F value are 82.08%, 80.18% and 81.12% respectively. Bootstrapping based event attribute information extraction algorithm is designed, and the accuracy rate and recall rate of word based attribute information extraction are 80.80% and 85.16%.
(3) time and space driven classification of events: by analyzing time and space cognition and expression characteristics of events, an event classification method is proposed, which combines time, space, attribute, event name, triggering vocabulary and other contextual and contextual information. According to the sentence, paragraph, text, three language unit levels, this paper discusses the push of event substitutes. Experimental results show that the accuracy of event classification is 92.30% and 80.60%. respectively in closed and open tests.
(4) the matching and visualization of event space-time information: using the geographical name database as the spatial data source, the method of matching and visualizing the qualitative spatio-temporal information (place name, spatial relationship and time information) is proposed. The method of subject event judgment and time space process reconstruction based on the multiple consistency of time space concept type is explored and the method of time and space process reconfiguration is realized. The organic and intuitive visual expression of event information in spatiotemporal information system is analyzed, and the temporal and spatial information distribution patterns of events are clustered.
The research results show that the combination of rule model and statistical model can effectively implement the information extraction of event spatiotemporal and attribute information in Chinese text, but the setting of feature items plays an important role in the learning process of statistical models; time, place name, spatial relation, event name and type of events of different types of events The extraction model has generality and portability, and there are great differences in attribute information. The corresponding knowledge base and learning model should be constructed for specific types of events. The event type judgment has the characteristics of flexible, complex, semantic fuzzy and uncertainty, and it belongs to multi label classification, fusion of words, triggering vocabulary, time, space, attribute and event name. A variety of contextual semantic and contextual information can effectively improve the effect of event classification; the coverage and data quality of spatial data, as well as the spatial relationship analytic model, have a great influence on the matching of event space-time and attribute information and the performance of time and space process reconfiguration.
【学位授予单位】：南京师范大学
【学位级别】：博士
【学位授予年份】：2013
【分类号】：P208;TP391.1

【参考文献】