当前位置:主页 > 文艺论文 > 语言学论文 >

中文微博情感分析关键技术研究

发布时间:2018-10-08 21:36
【摘要】:微博(Micro Blog)自引入国内以来,快速成长为核心社交平台,用户通过微博表达和分享自己的情感与观点。研究微博情感分析,有利于政府的民意调查、舆情监测和管理,商业满意度调研等应用,另外微博与传统文本差异性比较大,对自然语言处理技术提出更高的挑战。目前国内中文微博的情感分析研究尚处于起步阶段,还有大量的问题需要深入讨论研究。因此,研究微博的情感分析具有较高的学术理论价值和实际应用价值。 这篇论文对所研究的中文微博情感分析的关键技术进行介绍,主要包括中文微博情感词典构建、微博情感分析特征产生与选择、微博情感分类器等关键技术。 在微博情感词典的构建方法研究方面,主要包括微博基础情感词典、微博表情符号情感词典和微博网络用语情感词典。根据三类情感词典的不同特点,提出相应的构建方法,并将其运用到微博情感分析。实验表明,利用情感权值的和作为分类依据(SO-A),对微博语料的分类微平均(micro-average)可达到78.61%;而利用情感词的极性作为分类依据(SO-P),分类微平均为70.76%。在混合语料环境中,(SO-A)分类微平均(micro-average)为79.88%,(SO-P)分类微平均为71.75%。说明本文构建的情感词典,在情感词的选择、情感极性的判断和权值的计算都是有效的,情感词典的质量比较高,可直接应用于微博以及其他类型的语料情感分析,并且具有分类效果好、过程简单和性能稳定的优势。 在微博情感的特征产生与选择以及分类器研究方面,主要介绍基于朴素贝叶斯微博情感分析情况,针对微博短文本的特点,将微博视为单一观点和观点分割两种情况进行分析;研究了CHI统计方法、情感词典和句法路径结合情感词典3种方法;选用词频、BOOL值、TF-IDF三种方法进行权值计算。分类结果发现,在单一观点情况下,获得最高的微平均75.69%;在观点分割情况下,分类最高微平均78.63%,表明了观点分割可取得较好的微博情感分类效果。在利用朴素贝叶斯进行微博情感分析时,采用BOOL权值和结合句法路径与情感词典的二次提取可取得较好的效果,因此总结出最优的预处理方式是“观点分割+二次提取+BOOL权值”,,可达到微平均78.63%; 另外,在混合语料(微博语料与产品评论混合)环境下,探讨了海量网络文本(微博与评论)的情感分析。实验发现,利用情感词典的分类性能(微平均79.88%)比朴素贝叶斯(微平均67.8%)好,并且具有简单、快速和稳定的优势。
[Abstract]:Weibo (Micro Blog) has grown rapidly into a core social platform since it was introduced to China. Users express and share their feelings and opinions through Weibo. The study of Weibo's emotion analysis is beneficial to the application of public opinion survey, public opinion monitoring and management, business satisfaction investigation and so on. In addition, there is a great difference between Weibo and the traditional text, which poses a higher challenge to the natural language processing technology. At present, the study of Chinese Weibo's affective analysis is still in its infancy, and there are still a lot of problems to be discussed. Therefore, the study of Weibo's emotional analysis has higher academic theoretical value and practical application value. This paper introduces the key techniques of Chinese Weibo's affective analysis, including the construction of Chinese Weibo's emotion dictionary, the generation and selection of the characteristics of Weibo's affective analysis, the Azerbai_person2# 's affective classifier, and so on. In the aspect of the construction method of Weibo's emotion dictionary, it mainly includes the basic emotion dictionary of Weibo, the emoji emotion dictionary of Weibo and the online emotion dictionary of Weibo. According to the different characteristics of three kinds of emotion dictionaries, this paper puts forward the corresponding construction methods and applies them to Weibo's affective analysis. The experimental results show that by using the sum of emotional weights as the basis of classification (SO-A), the classification microaverage (micro-average) of Weibo corpus can reach 78.61, while the polarity of affective words as the basis of classification (SO-P) is 70.76%. In the mixed corpus environment, the micro-average of (SO-A) was 79.88, and the average of (SO-P) was 71.75. It shows that the emotion dictionary constructed in this paper is effective in the choice of emotion words, the judgment of emotion polarity and the calculation of weight, and the quality of emotion dictionary is relatively high, which can be directly applied to Weibo and other types of corpus emotion analysis. And it has the advantages of good classification effect, simple process and stable performance. In the aspect of the feature generation and selection of Weibo's emotion and the research of classifier, this paper mainly introduces the situation of emotion analysis based on naive Bayes Weibo. According to the characteristics of the short text of Weibo, the paper analyzes the two kinds of situations, which are regarded as a single point of view and divided from a viewpoint. This paper studies three methods: CHI statistical method, affective dictionary and syntactic path combined with affective dictionary, and adopts three methods of word frequency and TF-IDF to calculate weights. The results show that in the case of a single viewpoint, the highest micro-average 75.69g is obtained, and in the case of viewpoint segmentation, the highest micro-average 78.63 is obtained, which indicates that opinion segmentation can achieve a better effect on Weibo's emotional classification. When using naive Bayes to carry on Weibo's affective analysis, the BOOL weight and the second extraction of syntactic path and emotion dictionary can get good results. Therefore, it is concluded that the optimal preprocessing method is "second extraction of BOOL weight value by viewpoint segmentation", which can reach the average of 78.63. In addition, under the environment of mixed corpus (Weibo and product comment), This paper discusses the emotional analysis of massive online texts (Weibo and comments). The experimental results show that the classification performance of affective dictionaries (79.88%) is better than that of naive Bayes (67.8%), and it has the advantages of simplicity, rapidity and stability.
【学位授予单位】:广东外语外贸大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:H085.5;TP391.1

【参考文献】

相关期刊论文 前10条

1 刘志明;刘鲁;;基于机器学习的中文微博情感分类实证研究[J];计算机工程与应用;2012年01期

2 张玉芳;彭时名;吕佳;;基于文本分类TFIDF方法的改进与应用[J];计算机工程;2006年19期

3 常晓龙;张晖;;融合语素特征的中文褒贬词典构建[J];计算机应用;2012年07期

4 周程远;朱敏;杨云;;基于词典的中文分词算法研究[J];计算机与数字工程;2009年03期

5 吴保珍;何婷婷;李立;张勇;陈龙;;基于全切分获取网络流行语方法研究[J];计算机应用研究;2009年04期

6 杨鼎;阳爱民;;一种基于情感词典和朴素贝叶斯的中文文本情感分类方法[J];计算机应用研究;2010年10期

7 周茜,赵明生,扈e

本文编号:2258343


资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/yuyanxuelw/2258343.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户2da8a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com