人物评价文本情感分析研究
发布时间:2018-05-19 21:14
本文选题:汉语文本情感分析 + 人物评价文本 ; 参考:《苏州大学》2016年博士论文
【摘要】:文本情感分析以主观性文本为研究对象,对之进行标注、识别、分类、聚类和抽取等操作,以达到有效判断、提取、汇总这些文本中蕴含的情感和观点的目的。文本情感分析目前主要的研究内容包括:情感文本语料库建设、主客观分类、评价极性分析、评价对象抽取、文本情感摘要和文本情感汇总等。随着移动互联网应用的普及,舆情分析、产品评价分析等应用必将发挥更广泛和重要的作用,而这些应用都是以文本情感分析研究为基础的。尽管文本情感分析的研究已经取得了一定成绩,但与实际应用的需求还有很大差距。尤其在人物评价文本情感分析方面,相关的研究非常缺乏。与研究较多的产品评价文本相比,人物评价文本所包含的情感表达有其独特的特性,不能将以往的研究直接用于人物评价文本的情感分析中。针对人物评价文本,本文利用机器学习和数据挖掘方法,开展了情感分析的研究,主要工作包括以下三方面:首先,本文设计了一个基于多分类器融合和主动学习方法的人物评价语料库构建方案,并获得了人物正负评价语料库及脏话语料库。在少量人工标注语料的基础上,利用保守投票的多分类器融合规则,逐步扩充一个带正负类别标签的人物评价语料库。该语料库是针对人物评价本文情感分析的研究基础。特别值得注意的是,针对人物评价文本中广泛存在的脏话现象,在人工收集并标注少量脏话句子的基础上,使用主动学习的方法,多次迭代形成了一个高质量的脏话文本语料库。实验结果表明,基于此语料库构建的识别脏话方法,能够提高负面评价识别的准确率和查全率。其次,本文提出了一个基于知识库和搜索引擎的两层架构人物分类方法。情感分析存在领域依赖问题,针对不同类型人物的评价文本的遣词造句有较大差距。因此,针对人物评价的情感分析研究迫切需要对人物的类型进行划分。针对该问题,本文提出了一个基于知识库和搜索引擎的两层架构人物分类方法。利用知识库进行人物分类,对无法在知识库中检索到的人物利用搜索引擎返回的新闻文本进行人物分类。针对搜索引擎可能反馈噪声新闻的情况,设计了一个基于主题模型的有效新闻提取算法。实验结果表明,本文提出的方法能够有效的对人物类型进行分类。最后,本文提出了一种基于二分图最大权完全匹配的评价要素抽取方法。基于评价对象和评价词在文本中的修饰与约束关系,本文提出了一个基于二分图的评价对象和评价词抽取方法,把评价对象和评价词作为二分图的两个顶点集合;在此基础上,设计了一个集合词性和句子关系的句子级PMI计算方法用于句子在二分图中的权重计算方法。该方法的优势在于计算出的PMI值能够精细刻画评价对象与评价词之间的联系;然后,利用匈牙利和Kuhn-Munkras算法求出二分图的最大权完全匹配,对结果进行筛选,从而得到评价对象和评价词二元组。实验结果表明本文提出的评价要素抽取方法能够有效提高抽取的正确率和召回率。最后本文综合上述技术,通过实验成功挖掘出了针对不同类别人物评价文本中的主要评价对象以及常用评价词,汇总出了正面和负面评价的评价对象的不同侧重点。总体而言,本文的主要贡献在于对于人物评价分析的关键问题进行了深入研究。主要在人物评价情感分析语料库、人物类型分类方法、评价对象和评价词抽取方法提出了新方法。这些方法对于情感分析其他领域领域同样具有很好的参考价值。
[Abstract]:The text emotion analysis takes the subjective text as the research object, carries on the annotation, recognition, classification, clustering and extraction, so as to achieve the effective judgment, extraction and summary of the emotions and views contained in these texts. The main research contents of text emotion analysis include: the construction of emotional text corpus, the classification of subjective and objective, and the evaluation pole. Sex analysis, evaluation of object extraction, text emotion summary and text emotion summary. With the popularization of mobile Internet, public opinion analysis, product evaluation and analysis will play a more extensive and important role, and these applications are based on the research of text emotional analysis. Although the research of text emotional analysis has already been obtained There is a great gap between the needs of the actual application, especially in the emotional analysis of the character evaluation text, the related research is very short. Compared with the more research product evaluation text, the emotion expression contained in the character evaluation text has its unique characteristics, and the previous research can not be used directly for the character evaluation text. In emotion analysis, in view of the character evaluation text, this paper uses machine learning and data mining methods to carry out the research of emotional analysis. The main work includes the following three aspects: first, this paper designs a figure evaluation language database construction scheme based on multi classifier fusion and active learning method, and obtains the character positive and negative evaluation language materials. On the basis of a small number of artificially tagged corpus, the corpus is gradually expanded by using the multi classifier fusion rules of conservative voting. The corpus is the basis for the research of emotional analysis in this paper. It is particularly noteworthy that the text is widely used in the character evaluation text. The existence of dirty words, on the basis of manual collection and annotation of a small number of dirty words, the use of active learning method, multiple iterations to form a high quality text corpus of dirty words. Experimental results show that the method of identifying dirty words based on this language database can improve the accuracy and recall of negative evaluation recognition. Secondly, this paper A classification method of two layers architecture based on knowledge base and search engine is proposed. There is a domain dependence on emotion analysis. There is a large gap in the words and sentences for the evaluation text of different types of characters. Therefore, the emotional analysis research for the character evaluation needs to be divided into the types of human and objects. A classification method of two layer architecture based on knowledge base and search engine is proposed. Using the knowledge base to classify the characters and classify the characters that can not be retrieved in the knowledge base using the news text returned by the search engine. A topic model is designed for the search engine to feed back the noise new news. The experimental results show that the method proposed in this paper can effectively classify the types of characters. Finally, this paper proposes an evaluation factor extraction method based on the complete matching of the maximum weight of the two partite graph. Based on the relation between the trimming and constraint of the evaluation object and the evaluation word in the text, this paper proposes a method based on this method. The evaluation object and the evaluation word extraction method of the two sub graph are taken as the two vertex sets of the two partite graph. On this basis, a sentence level PMI calculation method of the set part of speech and the sentence relation is designed to calculate the weight of the sentence in the two sub graph. The advantage of this method is that the calculated PMI value can be refined. The relationship between the evaluation object and the evaluation word is finely drawn; then, the maximum right of the two sub map is fully matched by Hungary and Kuhn-Munkras algorithm, and the results are screened to get the evaluation object and the evaluation word two tuples. The experimental results show that the proposed method of evaluation factor extraction can effectively improve the accuracy and call of the extraction. Finally, in this paper, the main evaluation objects and common evaluation words for different categories of character evaluation texts are successfully excavated through the experiment, and the different emphasis of the positive and negative evaluation objects is summarized. A new method is proposed for the evaluation of emotional analysis corpus, classification of character types, evaluation objects and evaluation of words extraction. These methods are also of good reference value for other fields of emotional analysis.
【学位授予单位】:苏州大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP391.1
【相似文献】
相关博士学位论文 前1条
1 朱晓旭;人物评价文本情感分析研究[D];苏州大学;2016年
,本文编号:1911712
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1911712.html