汉语依存图库的构建
发布时间:2018-02-11 02:26
本文关键词: 句法语义 依存语法 图结构 标注 图库 出处:《南京师范大学》2015年硕士论文 论文类型:学位论文
【摘要】:汁算机自然语言处理需要从线性的句子中获取词语之间的语义关系,树形的句法结构可以推导出句子成分之间主要的语义关系,在自然语言处理中起着重要作用,但随着近年来语料库建设规模的不断扩大,研究者发现用投影树无法完整地描写句法结构,并且还发现有相当数量的非投影树结构和图结构。同时由于汉语自身的特点,长期以来,汉语句法分析精度较低,现有的句法分析技术不适合处理汉语中的一些特殊句式(连动句、兼语句、动词拷贝、长句等),,亟需寻找新的技术手段解决非这一难题。一些研究者提出了AMR这种基于图的句子语义表示方法,用来分析英语。本文则尝试借鉴这一方法来探究基于依存语法对汉语进行句法语义一体化标注(简称依存图标注),讲而构建汉语依存图库。本文的主要内容如下:第一步,梳理并分析了句法理论和句法结构表示方法的发展过程,在这个过程中发现在句法分析和论元分析的过程中经常出现了超出树结构的现象,这是引进图结构的一个重要原因,然后,进一步统计分析CoNLL2009评测的中文数据,结果表明了根据树结构难以推导出所有的语义结构,这就需要探索汉语句子的基于图的句法语义一体化标注新方案;第二步,基于以上的理论准备,通过实际标注和反复的验证修改,逐步构建出基于依存图标注的标记集体系和具体的标注规范,这也是本研究的创新之处:第三步是实际操作部分,使用第二步确定的标记集和标注规范对已有的CoNLL2009评测的中文数据中的一部分数据进行依存图标注,一共标注了1230句,并记录了标注过程中遇到的一些问题;第四步则是对第三步的标注结果进行统计和分析,统计发现在标注好的1230句的语料中形成图结构的句子有795句,占到语料的64.6%。这部分就主要分析了标注中形成图结构的一些特殊的语言现象,例如,兼语句、连动句、二价名词等,对这些特殊殊子的朴理正是依存图相对干依存树的优势所在,也是构建依存图库的关键所在。本文的创新之处在于,首先是提出用图结构来表示汉语句法语义分析结果;其次是提出一套新的汉语句法语义一体化标注的标记集合标注规范,另外还将依存语法和框架语义学结合起来对汉语进行分析。本文通过逐步的研究、分析发现,汉语中存在一定数量的需要用图结构表示才能完全揭示其句法语义关系的句子,这类句子往往就是影响汉语句法分析精度的夫键;而标注的实际操作过程和统计分析的结果也证明了,图结构相对于树结构在揭示句子句法语义关系方面有明显的优势。
[Abstract]:Juicing machine natural language processing needs to obtain the semantic relationship between words from linear sentences. The tree syntax structure can deduce the main semantic relations among sentence components, and it plays an important role in natural language processing. However, with the expansion of corpus construction in recent years, researchers have found that projective trees can not describe syntactic structures completely, and that there are quite a number of non-projective tree structures and graph structures. For a long time, the accuracy of Chinese syntactic analysis has been low, and the existing syntactic analysis techniques are not suitable for dealing with some special sentence patterns in Chinese. It is urgent to find new technical means to solve this problem. Some researchers have proposed AMR, a graph-based semantic representation of sentences. This paper tries to use this method for reference to explore the syntactic and semantic integration tagging of Chinese based on dependency grammar. The main contents of this paper are as follows: first, This paper analyzes the development of syntactic theory and syntactic structure representation. In this process, it is found that in the process of syntactic analysis and argument analysis, there are phenomena beyond tree structure, which is an important reason for the introduction of graph structure. Then, further statistical analysis of the Chinese data assessed by CoNLL2009 shows that it is difficult to deduce all semantic structures according to tree structure, so we need to explore a new scheme of syntactic and semantic integration tagging based on graph in Chinese sentences. Based on the above theoretical preparation, through practical annotation and repeated verification and modification, a label set system and specific label specification based on dependency graph annotation are constructed step by step. This is also the innovation of this study: the third step is the practical operation part. The second step is used to determine the mark set and label specification to annotate some of the existing Chinese data evaluated by CoNLL2009. A total of 1230 sentences are annotated, and some problems encountered in the process of annotation are recorded. The 4th step is a statistical analysis of the result of the third step. The statistics show that there are 795 sentences in the tagged 1230 sentence corpus that form the graph structure. This part mainly analyzes some special linguistic phenomena that form the graph structure in the tagging, such as concurrent sentences, continuous sentences, bivalent nouns, etc. It is the advantage of dependency graph relative to dry dependency tree and the key to construct dependency graph library. The innovation of this paper lies in that, first of all, graph structure is proposed to represent the result of syntactic and semantic analysis in Chinese. Secondly, we propose a new set of tagging specifications for Chinese syntactic and semantic tagging. In addition, we combine dependency grammar and frame semantics to analyze Chinese. There are a certain number of sentences in Chinese that need to be represented by graph structure to fully reveal their syntactic and semantic relations. These sentences are often the keys that affect the accuracy of Chinese syntactic analysis. The actual operation process and statistical analysis also prove that graph structure has obvious advantages over tree structure in revealing syntactic and semantic relations of sentences.
【学位授予单位】:南京师范大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:H146
【参考文献】
相关期刊论文 前1条
1 游汝杰;现代汉语兼语句的句法和语义特征[J];汉语学习;2002年06期
本文编号:1502001
本文链接:https://www.wllwen.com/wenyilunwen/yuyanxuelw/1502001.html