当前位置:主页 > 文艺论文 > 汉语言论文 >

计算语言学视角下的语料库标注探析

发布时间:2019-01-24 11:08
【摘要】:语料库的出现以及语料库语言学的诞生,在语言学研究中具有划时代的意义。语料库出现后发展迅猛,容量不断扩大,功能不断增强,研究和应用的范围也不断扩展。在这个过程中,语料库标注发挥了巨大作用。语料库标注是语料库的重要组成部分,已成为语料库研究的热点。语料库标注能够揭示语言深层信息,拓展语料库的功能,是语料库资源用于计算语言研究的前提条件。目前尚未有文献全面论述语料库标注,以往对语料库标注的研究侧重于构建实用的标注系统,孤立地研究某一种标注类型,散见于大型语料库的技术规范中缺乏对相关理论的思考和探讨。 文章从计算语言学的角度,论述语料库标注的概念、意义原则类型等一系列问题,侧重介绍结构标注和语义标注这两种标注类型,重点提出了一种结构标注模型和语义标注模型。引言部分总结了目前国内对语料库标注的研究现状,对研究内容研究方法做出说明,指明文章的重点。第二章联系语料库的特征归纳出语料库标注的概念,从两方面阐述语料库标注的意义。在阐释语料库语言学家Leech提出的语料库标注原则基础上,针对新型语料库的标注需求补充了四条标注原则:①以语料库的主要用途为导向设计实用的标注系统;②注意不同层次语料库标注之间的的兼容性;③重视语料库标注对相关软件的支持;④设计便于共享的语料库标注。第三章介绍新旧两种语料库的标注模式,阐明一系列围绕TEI标注模式的概念。引入与TEI模式联系紧密的标准通用置标语言。对几种标注类型做出总结。第四章分析语料库的语法标注,重点论述语法标注中的结构标注,介绍两种主要的结构标注语料库:短语结构树库以及依存结构树库,并针对汉语语法结构特点提出句法结构最简标注模型。该模型以直接成分分析法作为标注理论,通过简单的符号系统描写句子的语法结构,用类似词性标注的形式实现了结构标注,对汉语结构标注有一定的参考价值。第五章以语义标注为主要内容,在前人研究基础之上,提出了一种句义标注模型,该模型句义标注部分参考格语法制订标注集,标注种类包括词性标注,结构标注,,句义标注,信息容量大且易于在机器中实现,为汉语句义标注提供全新的可供参考的模型。第六章从语法标注和语义标注两个方面概括归纳汉语语料库标注的特点。第七章为结语,回顾全文同时指出日后需要进一步完善之处。
[Abstract]:The emergence of Corpus and the birth of Corpus Linguistics have epoch-making significance in linguistic research. Since the emergence of corpus, the capacity and function of corpus have been expanded rapidly, and the scope of research and application has also been expanded. In this process, corpus annotation plays an important role. Corpus tagging is an important part of corpus and has become a hot topic in corpus research. Corpus annotation, which can reveal the deep information of language and expand the function of corpus, is the precondition of corpus resources for computational language research. At present, there is no comprehensive discussion on corpus annotation. In the past, the research of corpus annotation focused on the construction of practical annotation system and isolated research on a certain annotation type. In the technical specifications scattered in large corpora, there is a lack of thinking and discussion on relevant theories. This paper discusses the concept and significance of corpus annotation from the perspective of computational linguistics. Principles? A series of problems such as structure annotation and semantic annotation are introduced, and a structure annotation model and semantic annotation model are put forward. The introduction summarizes the current research situation of corpus annotation in China and the content of the research. The research method is explained and the key points of the article are pointed out. In the second chapter, the concept of corpus annotation is summed up in relation to the features of corpus, and the significance of corpus annotation is expounded from two aspects. On the basis of explaining the principles of corpus tagging proposed by Leech, a corpus linguist, this paper adds four principles to the demand for new type of corpus: (1) designing a practical annotation system based on the main uses of the corpus; (2) pay attention to the compatibility among different levels of corpus annotation; 3) attach importance to the support of corpus-based annotation to related software; 4) design a corpus annotation that is easy to share. The third chapter introduces the annotation patterns of the old and new corpora, and explains a series of concepts around the TEI annotation pattern. A standard universal markup language closely related to TEI schema is introduced. Several kinds of annotation types are summarized. Chapter four analyzes the grammar tagging of corpus, mainly discusses the structure tagging in grammar tagging, and introduces two main kinds of structure tagging corpus: phrase structure tree database and dependent structure tree database. According to the features of Chinese grammatical structure, the simplest tagging model of syntactic structure is proposed. This model takes the direct component analysis method as the annotation theory, describes the grammatical structure of the sentence by a simple symbolic system, and realizes the structure tagging in the form similar to the part of speech tagging, which has certain reference value for the Chinese structural tagging. The fifth chapter takes semantic annotation as the main content, and puts forward a sentence meaning annotation model based on the previous researches. The model includes part of reference case grammar of sentence meaning annotation, which includes parts of speech tagging, structure tagging, sentence meaning annotation. The information capacity is large and easy to be realized in the machine, which provides a new model for Chinese sentence meaning tagging. Chapter 6 generalizes the features of Chinese corpus annotation from two aspects: grammar annotation and semantic annotation. The seventh chapter is the conclusion, reviewing the full text and pointing out that further improvement is needed in the future.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:H087;H146;H136

【参考文献】

相关期刊论文 前10条

1 陶明忠;马玉蕾;;框架语义学——格语法的第三阶段[J];当代语言学;2008年01期

2 王跃龙;姬东鸿;;汉语树库综述[J];当代语言学;2009年01期

3 俞士汶,段慧明,朱学锋,孙斌;北京大学现代汉语语料库基本加工规范[J];中文信息学报;2002年05期

4 周强;汉语句法树库标注体系[J];中文信息学报;2004年04期

5 金澎;吴云芳;俞士汶;;词义标注语料库建设综述[J];中文信息学报;2008年03期

6 丁伟伟;常宝宝;;基于语义组块分析的汉语语义角色标注[J];中文信息学报;2009年05期

7 周明,黄昌宁;面向语料库标注的汉语依存体系的探讨[J];中文信息学报;1994年03期

8 刘海涛;赵怿怡;;基于树库的汉语依存句法分析[J];模式识别与人工智能;2009年01期

9 崔刚,盛永梅;语料库中语料的标注[J];清华大学学报(哲学社会科学版);2000年01期

10 袁毓林;;论元角色的层级关系和语义特征[J];世界汉语教学;2002年03期

相关博士学位论文 前1条

1 李军辉;中文句法语义分析及其联合学习机制研究[D];苏州大学;2010年



本文编号:2414425

资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/hanyulw/2414425.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户678cc***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com