基于复杂网络的汉语复句关系词搭配依存语言网及其应用研究
本文关键词:基于复杂网络的汉语复句关系词搭配依存语言网及其应用研究 出处:《华中师范大学》2016年博士论文 论文类型:学位论文
更多相关文章: 复杂网络 依存关系 依存语言网络 复句关系词搭配 复句层次关系
【摘要】:复杂网络(Complex Network)是从全局的视角来研究复杂系统的新方法,无论网络的结构有多复杂,其规模有多大,它都采用节点和边两大基本要素来研究复杂的网络系统。复杂网络已经成为研究复杂性科学与复杂系统的强有力的工具,对复杂网络的研究具有极强的交叉学科特征。目前,复杂网络的研究、应用不仅已渗透到数学、物理、计算机科学、化学、生物和工程技术等各个学科,而且已经在社会、政治、军事、医药、经济、管理和语言等各个层面、各个领域都得到了广泛的应用。依存分析方法的核心是依存关系,依存关系是一种句中词对的二元关系,支配词记为核心词,被支配词记为依存词(dependent),它是依存文法的主要元素,通过对语言单位内成分间的依存关系的分析来揭示其句法结构,反映的是语义上核心词和依存词间的依赖关系。人类语言是人们彼此交流、相互沟通的主要工具之一,它随着人类的进步也不断进化着,现在很多学者经过大量研究表明:人类语言也是人类复杂系统中的一种复杂网络。语言网络作为复杂网络的一种新的研究领域得到了迅速发展。本文的主要研究内容包括:(1)本文先简单介绍了现代汉语“复句关系词本体知识库”,然后采用复杂网络的研究理论与研究方法对“复句关系词本体知识库”中的搭配关系词进行研究,抽取出其中的457个搭配关系词构建成一个现代“汉语复句关系词搭配库”(RWCDB)。(2)本文研究并提出了一种自动识别多重有标复句中搭配关系词的二阶段方法。该方法的基本思想是先采用“基本准则识别法”,对复句中关系词的搭配特征信息进行基本识别,初步判别两个关系词能否形成搭配关系:然后对于复杂搭配类,再采用第二阶段的“树图”识别法对该复杂搭配类关系词的搭配关系进行处理和识别。(3)在RWCDB的基础上,采用计算机软件技术自动转换构建出一个具有457个汉语复句关系词节点的现代“汉语复句关系词搭配语言网络”,并研究了该搭配语言网络的“小世界效应”和“无标度特征”。(4)在“汉语复句关系词搭配语言网络”的基础上,采用依存分析方法,从汉语复句关系词搭配的本体角度,对汉语复句关系词搭配语言网络进行分析、研究,由于现代汉语复句关系词搭配关系的有向性,实际上它们体现的是搭配关系词之间的一种依存关系,即它们是前呼关系词与其后应关系词之间的相互依存关系。所以本文构建的汉语复句关系词搭配语言网络,实际上是一种“汉语复句关系词搭配依存语言网络”。本文对“汉语复句关系词搭配依存语言网络”的依存路径长度特征、平均依存路径长度特征、依存度特征、平均依存度特征和依存聚集特征、平均依存聚集特征进行详细研究,并得到一系列相关的基本依存特征值。(5)本文先在分析“依存路径长度值”的基础上,研究出“依存路径长度准则”,然后在该准则的基础上,研究出“基于依存路径长度准则的汉语复句关系词搭配关系自动识别方法”,设计出“基于依存路径长度准则的汉语复句关系词搭配关系自动识别算法”,开发出“基于依存路径长度准则的汉语复句关系词搭配关系自动识别软件”。为了验证该软件的有效性,从CCCS语料库中挑选出1000条有标复句进行实验,实验结果该软件运行的正确率达到90%以上。本文的研究有助于提高网络搜索引擎的查全率、查准率和搜索的速度,有助于提高机器翻译的水平,有助于促进计算语言学和中文信息处理的研究与发展,有助于提高我国中小学的汉语教学和对外汉语教学的水平。本文研究成果的直接作用是为计算机识别并处理复句的层次关系奠定基础,为人们进一步深入研究计算机自动识别复句和汉语篇章的重大难题奠定研究基础。本文研究内容均属于理科、工科和文科三个学科的交叉性内容,所以本文的研究内容均为初探性。到目前为止,学术界还没有发现相关的研究报道。
[Abstract]:The complex network (Complex Network) is a new method to study the complex system from a global perspective, regardless of the complexity of the network structure, its size, it has the nodes and edges of the two basic elements to study the complex network system. The complex network has become a powerful tool to study the complex science and complex the study of complex network has a strong interdisciplinary characteristics. At present, the study of complex networks, the application has not only penetrated into mathematics, physics, computer science, chemistry, biology and engineering technology and other disciplines, and has been in the social, political, military, medicine, economy, management and language etc. level, in various fields have been widely used. The core dependency analysis method is the dependency relation, dependency relation is two yuan of a sentence, dominated words as the core, dominated by words for word dependency (depend ENT), it is the main element of dependency grammar, to reveal its syntactic structure through the analysis of the dependency relation between components within the unit of language, reflects the semantic core words and dependency between words dependencies. Human language is for people to communicate with each other, one of the main tools to communicate with each other, it is with the progress of mankind the evolution, now many scholars after a large number of studies show that: a complex network of human language and human language network in complex systems. As a new research field of complex networks has been developing rapidly. The main research contents of this paper include: (1) this paper first introduces the modern Chinese "relationship ontology the knowledge base, research theory and research method and the complex network to research the relationship between the sentences ontology knowledge base" in the collocation of words, extracted 457 collocation of them The relationship between words to construct a "modern Chinese sentence collocation library" (RWCDB). (2) this paper presents a new automatic identification of multiple two stage method of word collocation of complex sentence. The basic idea of this method is first to use the "basic principles of identification method, the basic recognition of complex which word collocation feature information, determine the initial two words can form a collocation: then for complex collocation, the second stage of the" tree "method to identify the complex relationship between word collocation collocation types of processing and recognition. (3) on the basis of RWCDB, using computer software automatic conversion technology to construct a 457 Chinese conjunctions in complex sentences in modern Chinese sentence node collocation language network", and studies the collocation language network "small world effect and scale-free characteristics." (4 ) based on "Chinese sentence collocation language network", using the dependency analysis method, from the angle of ontology of Chinese sentence collocation, collocation of Chinese complex sentence language network analysis, research, because the modern Chinese sentence collocation relation to, they actually is the embodiment of a collocation the relationship between dependency relation between words, which are discussed of mutual dependency relation relations between words should be. So Chinese Conjunctions in complex sentences the collocation of language network, is actually a "Chinese complex sentence collocation language network". The "dependency path length characteristics of Chinese sentence collocation dependency language network", the average path length dependence characteristics, dependence characteristics, average dependence characteristics and accumulation characteristics of dependence, dependence on average accumulation characteristics in detail. The basic characteristics of dependency and obtains a series of related values. (5) firstly, based on the analysis of the "dependency path length", the "dependency path length criterion, and then based on this criterion, the" Chinese Conjunctions in complex sentences dependency path length rule collocation method for automatic identification based on the design of "Chinese Conjunctions in complex sentences dependency path length rule collocation algorithm of automatic recognition based on the" developed "Chinese Conjunctions in complex sentences dependency path length rule collocation automatic recognition software based on". In order to verify the validity of the software, from the CCCS corpus selected 1000 tag complex sentence experiment the experimental results, the accuracy of the software running above 90%. This study helps to improve the network search engine recall, precision and search speed, help to improve the machine translation The level of translation, help to promote the research and development of computational linguistics and information processing Chinese, help to improve the Chinese teaching in primary and middle schools in our country and the foreign language teaching level. The direct effect of the results of this study is to lay the foundation for the computer to recognize and deal with the hierarchy of complex sentences, which lays a foundation for further major problem people study computer automatic recognition of complex sentence and Chinese text. The contents of this paper are science, interdisciplinary content of three disciplines of engineering and liberal arts, so the research contents of this paper are preliminary. Up to now, the academic circles have not found related research reports.
【学位授予单位】:华中师范大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:H146
【相似文献】
相关期刊论文 前10条
1 陈保亚;汉台关系词的相对有阶分析[J];民族语文;1997年02期
2 孙蕾;关系词与语言的自我中心性[J];外语学刊;2001年02期
3 肖升;胡金柱;姚双云;吴锋文;;关系词搭配的联列分析[J];宁夏大学学报(人文社会科学版);2009年06期
4 胡金柱;吴锋文;李琼;舒江波;;汉语复句关系词库的建设及其利用[J];语言科学;2010年02期
5 陈保亚;台佤关系词的相对有阶分析[J];语言研究;1997年01期
6 陈保亚;再论核心关系词的有阶分布[J];民族语文;1998年03期
7 陈保亚;汉台关系词双向相对有阶分析[J];语言研究;1998年02期
8 赵修江;;疑问词、连接词、关系词专练[J];中学英语园地(初三版);2008年Z1期
9 谢奎金;;选择关系词十注意[J];高中生;2010年24期
10 周振香;;关系词的用法[J];高中生;2013年27期
相关会议论文 前2条
1 胡金柱;沈威;杜超华;;基于规则的复句中的关系词标注探讨[A];第三届学生计算语言学研讨会论文集[C];2006年
2 胡金柱;沈威;杜超华;罗进军;;基于渡越矩阵的复句关系词自动标注初探[A];第三届学生计算语言学研讨会论文集[C];2006年
相关重要报纸文章 前1条
1 马清华;义近:亲属关系词必须满足的意义条件[N];中国社会科学报;2010年
相关博士学位论文 前4条
1 胡泉;基于复杂网络的汉语复句关系词搭配依存语言网及其应用研究[D];华中师范大学;2016年
2 舒江波;面向中文信息处理的复句关系词自动标识研究[D];华中师范大学;2011年
3 杨红;现代汉语关系名词研究[D];华中师范大学;2013年
4 姚双云;复句关系标记的搭配研究与相关解释[D];华中师范大学;2006年
相关硕士学位论文 前10条
1 荣蕾;基于依存语法的汉语复句关系词自动标识[D];华中师范大学;2015年
2 丁彦;基于关系词的汉语多重复句层次结构的研究[D];华中师范大学;2015年
3 黎琛;基于依存树相似度计算的汉语复句关系词自动识别[D];华中师范大学;2015年
4 王娜娜;基于语义文法的术语关系获取方法研究[D];广西师范大学;2015年
5 宋林森;基于搭配强度的复句关系词自动标识方法研究[D];华中师范大学;2014年
6 金鑫;“一边”类关系词及其相关句式研究[D];华中师范大学;2007年
7 向磊;基于决策树的汉语复句关系词自动识别中规则挖掘方法研究[D];华中师范大学;2014年
8 斯琴呼;关于阿鲁科尔沁次土语亲属关系词[D];内蒙古大学;2011年
9 沈威;基于渡越矩阵与语境计算模型的复句关系词的自动标注[D];华中师范大学;2007年
10 徐涛;复句关系词自动标识中规则引擎的实现策略研究[D];华中师范大学;2013年
,本文编号:1414746
本文链接:https://www.wllwen.com/shoufeilunwen/rwkxbs/1414746.html