基于标记隐现规则和关联特征的复句层次结构分析研究
本文选题:复句层次结构 + 标记隐现模式 ; 参考:《华中师范大学》2017年硕士论文
【摘要】:中文信息处理作为计算语言学的一个分支,在人工智能、搜索引擎等互联网技术飞速发展的今天显得越来越重要。汉语的使用范围也随着中国在国际上的影响力逐渐扩大,而汉语复句作为汉语语言的重要组成部分,已经成为计算机解决的核心对象,也是目前中文信息处理的研究难点之一。目前,对复句的研究主要包括关系标记的自动标识、分句和非分句的判断、复句层次的自动划分和复句关系的识别。其中,关系标记的自动标识和分句划分已经有了较多的研究,而复句层次的自动划分和关系识别研究较少。鉴于关系标记的自动标识技术已经基本成熟,且关系标记本身具有标明复句层次结构和分句间逻辑语义的作用,因此,在对复句的层次结构进行分析时也要紧紧抓住关系标记这一重要形式标志。然而,由于汉语表达方式的多样性,分句内总会出现关系标记的缺省,即关系标记的隐现,这就导致仅仅依靠关系标记实现复句层次的识别困难重重。为此,本文采取“分而治之”的策略,将研究对象(三句式有标复句)分成充盈态和非充盈态两种类型;同时,为解决标记缺省的问题,构建了标记配位类型表和标记隐现规则,实现对复句内的标记隐现模式的自动提取;另外,在复句的依存句法分析基础上,提出了利用句法成分复现进行分句间关联度的计算。最终通过构建基于标记隐现规则和关联特征的复句层次结构识别模型,达到对复句的层次结构进行自动划分的目的。本文的工作从以下几个方面开展。首先,本文利用依存句法和标点符号对复句内的分句进行划分;其次,在剔除伪分句的基础上,对分句内的关系标记进行标注和提取,以期获得复句的关系标记序列;然后,构建标记配位类型表,并在此基础上提出了标记隐现模式确定算法,获得给定复句的标记隐现模式;同时,在依存句法分析的基础上,提出利用句法成分复现对分句间的关联度进行计算的方法;最后,构建基于标记隐现规则和关联特征进行复句层次结构判断的模型,通过标记隐现规则对充盈态三句式有标复句的层次进行识别和分析,对于非充盈态三句式有标复句则利用关联特征进行层次结构判断。通过实验,标记隐现模式获得的正确率达91.5%,复句层次结构分析的正确率达90.6%。该结果表明,本文提出的方法对复句层次结构的分析是行之有效的。
[Abstract]:As a branch of computational linguistics, Chinese information processing is becoming more and more important with the rapid development of artificial intelligence, search engine and other Internet technologies. The scope of use of Chinese has gradually expanded with the international influence of China. As an important part of Chinese language, Chinese complex sentence has become the core object of computer solution and one of the difficulties in the research of Chinese information processing. At present, the research on complex sentences mainly includes the automatic identification of relational markers, the judgment of clauses and non-clauses, the automatic division of complex sentence levels and the recognition of complex sentence relationships. Among them, more research has been done on automatic identification and clause division of relational markers, but less on automatic classification and relationship recognition of complex sentence levels. In view of the fact that the automatic identification technology of relational tags is basically mature, and that relational tags themselves have the function of indicating the hierarchical structure of complex sentences and the logical semantics between clauses, so, In the analysis of the hierarchical structure of complex sentences, we should also grasp the important formal mark of relational markers. However, due to the diversity of Chinese expressions, the default of relational markers, that is, the concealment of relational markers, always occurs in clauses, which leads to difficulties in realizing the recognition of complex sentence levels by only relying on relational markers. In order to solve the problem of marking default, this paper adopts the strategy of "divide and conquer", and divides the research object into two types: filling state and non-filling state. In this paper, the label coordination type table and marker concealment rule are constructed to realize the automatic extraction of the marker concealment pattern in complex sentences, in addition, based on the analysis of the dependency syntax of complex sentences, the correlation degree between clauses is calculated by using syntactic component repetition. Finally, the hierarchical structure recognition model of complex sentence based on label hidden rule and association feature is constructed to achieve the purpose of automatic division of complex sentence hierarchy. The work of this paper is carried out from the following aspects. First of all, this paper uses dependency syntax and punctuation to divide the clauses in complex sentences. Secondly, on the basis of eliminating pseudo-clauses, we annotate and extract the relational markers in clauses in order to obtain the sequence of relational markers of complex sentences. On the basis of constructing the marker coordination type table, a new algorithm for determining the label hidden pattern is proposed to obtain the marker hidden pattern of a given complex sentence, and at the same time, on the basis of the analysis of dependent syntax, This paper proposes a method of calculating the correlation degree between clauses by using syntactic component repetition. Finally, a model for judging the hierarchical structure of complex sentences based on marker concealment rules and associated features is constructed. This paper identifies and analyzes the levels of marked complex sentences in three sentence types of filling state by the rule of marked concealment, and judges the hierarchical structure of three marked complex sentences of non-filling state of three sentences by using the correlation feature. The experimental results show that the correct rate of marker concealment pattern is 91.5% and the correct rate of complex sentence hierarchy analysis is 90.6%. The results show that the method proposed in this paper is effective in analyzing the hierarchical structure of complex sentences.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 秦川;;复句关系的意象图式——以系联图示为例[J];语文学刊;2015年14期
2 郑晓晓;段雪璐;;现代汉语容认性让步句的形式分类[J];忻州师范学院学报;2014年03期
3 赵舸;;汉语复句三分法与对外汉语教学浅谈[J];邢台学院学报;2014年02期
4 石翠;;依存句法分析研究综述[J];智能计算机与应用;2013年06期
5 ;易混标点符号用法比较[J];新疆地方志;2012年04期
6 罗进军;;有标假设复句的语义关系特征[J];华中师范大学学报(人文社会科学版);2012年05期
7 段潇雪;;现代汉语语义角色研究述评[J];文教资料;2012年27期
8 郑伟发;;汉语句法分析研究综述[J];信息技术;2012年07期
9 吴锋文;;面向信息处理的“一标三句式”复句层次关系判定[J];北方论丛;2012年01期
10 吴锋文;;面向信息处理的“二标三句式”复句层次关系判定[J];信阳师范学院学报(哲学社会科学版);2012年01期
相关博士学位论文 前1条
1 舒江波;面向中文信息处理的复句关系词自动标识研究[D];华中师范大学;2011年
相关硕士学位论文 前2条
1 洪鹿平;汉语复句关系自动判定研究[D];南京师范大学;2008年
2 俞小娟;面向中文信息处理的汉语复句中书读短语的自动识别研究[D];华中师范大学;2008年
,本文编号:2087386
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2087386.html