羡余否定格式识别研究
发布时间:2019-07-08 18:38
【摘要】:本文提出羡余否定格式的识别新课题,羡余否定现象即指那些虽然形式上是“不、没、没有、别、非、未”等否定成分,但是实际上并不表示否定意义;以往对羡余否定格式的研究集中在语言本体方面,在自然语言处理方面对羡余否定格式的研究尚未出现。本文致力于对羡余否定格式的识别研究,这不仅有助于汉语的语义分析和计算机的自然语言理解,也可以基于对这些格式的识别进一步纠正分词问题。本文针对几个典型的羡余否定格式考察其句法特征,确定识别策略,并运用Python写出识别程序,观察识别结果,最终定出每个格式的识别方法。本文主要识别的羡余否定格式为:“好不XP”“难免不/没(有)XP”“没(有)XP之前”“差(一)点(儿)没(有)XP”等,还有一些羡余否定格式,由于语料太少,没有列入考察范围,如:“小心别XP”、“险些没XP”,“拒不XP”。对每一个羡余否定格式的识别策略不太一样,但是总的来说,因为羡余否定这种现象非常特殊,语料较少,采用统计的办法来识别比较困难,本文采用基于规则的方法,首先对训练语料中的羡余否定格式进行观察研究,发现其中的规律并把规律形式化为计算机可以执行的算法流程,然后对新输入的语料执行规则,判断其中的格式是否为羡余否定格式。本文所识别格式在测试语料中的调和平均值达到92%以上,而所识别的“没(有)XP之前”、“差(一)点(儿)没(有)XP”、“好不XP”格式的调和平均值能达到95%以上,可以看出用语言知识规则来识别效果比较理想。另外,基于本文对羡余否定格式的识别结果也可以提升机器对相关格式进行理解时的准确率,提升幅度跟具体格式有关。
[Abstract]:This paper puts forward a new topic of identification of redundant negative format, that is, the phenomenon of redundant negation refers to the negative components such as "no, no" in form, but in fact, it does not mean negative meaning. The previous research on redundant negative format has focused on language ontology, but the research on redundant negative format in natural language processing has not yet appeared. This paper focuses on the recognition of redundant negative formats, which is not only helpful to the semantic analysis of Chinese and the natural language understanding of computers, but also can further correct the word segmentation problem based on the recognition of these formats. In this paper, the syntactic features of several typical redundant negative formats are investigated, the recognition strategy is determined, and the recognition program is written by Python, the recognition results are observed, and the recognition methods of each format are finally determined. The redundant negative format identified in this paper is: "good or bad XP", "no (there) XP", "poor (1) point (er) no (XP)", and so on, and some redundant negative formats, because the corpus is too little, it is not included in the scope of investigation, such as: "be careful not to XP", "almost no XP", "refuse XP", etc. there are some redundant negative formats that are not included in the scope of investigation, such as: "be careful not to XP", "almost no XP", "refuse XP", etc. The recognition strategy of each redundant negative format is not the same, but generally speaking, because the phenomenon of redundant negation is very special and the corpus is less, it is difficult to use statistical method to identify it. This paper adopts the rule-based method, first of all, to observe and study the redundant negative format in the training corpus, find out the law and formalize the law into the algorithm flow that can be executed by the computer. Then the rules are executed on the newly input corpus to determine whether the format is redundant negative format. The harmonic average value of the format identified in this paper is more than 92% in the test corpus, while the harmonic average value of the "difference (1) point (er) no (XP)", and the harmonic average value of the "good XP" format can reach more than 95%. It can be seen that the recognition effect is ideal by using the language knowledge rules. In addition, the recognition results of redundant negative format based on this paper can also improve the accuracy of machine understanding of the relevant format, and the improvement range is related to the specific format.
【学位授予单位】:上海大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:H146
本文编号:2511792
[Abstract]:This paper puts forward a new topic of identification of redundant negative format, that is, the phenomenon of redundant negation refers to the negative components such as "no, no" in form, but in fact, it does not mean negative meaning. The previous research on redundant negative format has focused on language ontology, but the research on redundant negative format in natural language processing has not yet appeared. This paper focuses on the recognition of redundant negative formats, which is not only helpful to the semantic analysis of Chinese and the natural language understanding of computers, but also can further correct the word segmentation problem based on the recognition of these formats. In this paper, the syntactic features of several typical redundant negative formats are investigated, the recognition strategy is determined, and the recognition program is written by Python, the recognition results are observed, and the recognition methods of each format are finally determined. The redundant negative format identified in this paper is: "good or bad XP", "no (there) XP", "poor (1) point (er) no (XP)", and so on, and some redundant negative formats, because the corpus is too little, it is not included in the scope of investigation, such as: "be careful not to XP", "almost no XP", "refuse XP", etc. there are some redundant negative formats that are not included in the scope of investigation, such as: "be careful not to XP", "almost no XP", "refuse XP", etc. The recognition strategy of each redundant negative format is not the same, but generally speaking, because the phenomenon of redundant negation is very special and the corpus is less, it is difficult to use statistical method to identify it. This paper adopts the rule-based method, first of all, to observe and study the redundant negative format in the training corpus, find out the law and formalize the law into the algorithm flow that can be executed by the computer. Then the rules are executed on the newly input corpus to determine whether the format is redundant negative format. The harmonic average value of the format identified in this paper is more than 92% in the test corpus, while the harmonic average value of the "difference (1) point (er) no (XP)", and the harmonic average value of the "good XP" format can reach more than 95%. It can be seen that the recognition effect is ideal by using the language knowledge rules. In addition, the recognition results of redundant negative format based on this paper can also improve the accuracy of machine understanding of the relevant format, and the improvement range is related to the specific format.
【学位授予单位】:上海大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:H146
【参考文献】
相关期刊论文 前1条
1 潘先军;;羡余现象生成的认知解释[J];内蒙古大学学报(哲学社会科学版);2010年06期
,本文编号:2511792
本文链接:https://www.wllwen.com/wenyilunwen/yuyanxuelw/2511792.html