当前位置:主页 > 科技论文 > 搜索引擎论文 >

藏文名词短语结构类型分布与统计研究

发布时间:2018-03-05 17:10

  本文选题:名词短语 切入点:短语结构 出处:《西北民族大学》2017年硕士论文 论文类型:学位论文


【摘要】:大数据策略和深度学习方法已经成为藏语自然语言处理领域的主流技术。当前,知识资源和标注语料库的匮乏已经影响到了藏语智能化研究的进程,尤其是像WordNet、HowNet和框架语义一样的词汇语义资源和句法结构标注、语义角色标注以及篇章信息标注等资源,还未形成统一的规范模式,深度学习等主流的学习方法不能用于实际训练。因此,资源库建设已经成为藏文信息处理领域中一项基础而艰巨的任务。名词短语、动词短语和形容词短语研究是句法树库构建所面临的核心问题。本文在藏语句法树库框架下,对藏语名词短语及其结构展开分类统计研究,其目的是检验藏语短语结构分类归纳的准确性,提高藏语短语分析的效率,加快藏语句法树库构建的进程。文章主要分为八个章节进行叙述,首先讨论了短语的研究背景和研究现状,进一步去了解了英语和汉语中名词短语的相关句法分析理论和构建名词短语结构库所需的语料。其次,对英语、汉语和藏语的名词短语的概念进行叙述,并通过藏语文本真实语料对藏语中构成名词短语的结构进行分析,将词类修饰构成的名词短语进行分类归纳,通过分类归纳建立了藏语名词短语的标记集。最后,通过藏文真实语料中对名词短语结构的统计结果构建了名词短语结构库、名词短语词性标注库和名词短语结构标注软件。文章整体采用了语料实证、对比分析、统计分析、人工标注以及人工校对的研究方法,建立了藏语基本名词短语结构库和词性标注语料库。总之,藏文名词短语结构类型分布与统计研究为藏语句法语义分析和树库构建提供基本资源,为信息检索、搜索引擎、机器翻译、文本分类、模式识别、多媒体教学、网络等应用技术领域提供一定的理论与技术支持。
[Abstract]:Big data's strategy and in-depth learning methods have become the mainstream technology in the field of Tibetan natural language processing. At present, the lack of knowledge resources and annotated corpus has affected the process of intelligent Tibetan language research. In particular, lexical semantic resources and syntactic structure tagging, semantic role tagging and textual information tagging resources, such as WordNet HowNet and framework semantics, have not yet formed a unified normative model. Mainstream learning methods, such as in-depth learning, cannot be used for practical training. Therefore, the construction of a resource bank has become a basic and arduous task in the field of Tibetan information processing. The study of verb phrase and adjective phrase is the core problem in the construction of syntactic tree library. This paper, under the framework of Tibetan syntactic tree library, makes a statistical study of Tibetan noun phrases and their structures. The purpose of this paper is to test the accuracy of the classification and induction of Tibetan phrase structure, to improve the efficiency of Tibetan phrase analysis, and to speed up the construction of Tibetan syntactic tree bank. First of all, it discusses the background and present situation of phrase research, and further studies the syntactic analysis theory of noun phrase in English and Chinese, and the data needed to construct the noun phrase structure database. The concept of noun phrases in Chinese and Tibetan is described, and the structure of noun phrases in Tibetan is analyzed through the true data of Tibetan texts, and the noun phrases which are modified by parts of speech are classified and summarized. The tag set of Tibetan noun phrases is established by classification and induction. Finally, the noun phrase structure database is constructed through the statistical results of the noun phrase structure in the real Tibetan corpus. Part of speech tagging database and noun phrase structure tagging software. The research methods of corpus demonstration, comparative analysis, statistical analysis, manual tagging and artificial proofreading are used in this paper. The basic noun phrase structure database and part of speech tagging corpus are established. In a word, the distribution and statistical study of Tibetan noun phrase structure types provide basic resources for Tibetan syntactic and semantic analysis and tree database construction, as well as for information retrieval and search engine. Machine translation, text classification, pattern recognition, multimedia teaching, network and other applications provide some theoretical and technical support.
【学位授予单位】:西北民族大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H214

【参考文献】

相关期刊论文 前2条

1 章忠宪;;基于规则的英语名词短语结构自动识别研究[J];吉林工程技术师范学院学报;2013年07期

2 王维贤;;现代汉语的短语结构和句子结构[J];语文研究;1984年03期



本文编号:1571152

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1571152.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户075c0***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com