互联网商品评论信息的情感分析研究
发布时间:2018-03-01 19:17
本文关键词: 情感分析 商品评论 三支决策 互信息 分类器 出处:《东南大学》2016年硕士论文 论文类型:学位论文
【摘要】:随着互联网与电子商务的迅速发展,越来越多的人们习惯网络购物,与此同时,大量的互联网商品与商品评论信息对于人们选择合适的、性价比高的商品造成了一定的困扰。由此,对互联网商品评论信息进行情感分析显得尤为重要。针对互联网商品领域进行的情感分析,以某种网络商品的评论内容为样本,利用机器学习等方法自动分析其情感倾向,发现人们对于该商品的褒贬意见和态度。本文的研究课题是互联网商品评论信息的情感分析研究,主要目的是利用计算机技术分析网络产品中大规模的评论文本,得出其情感倾向性,在方便消费者选择合适的产品的同时,也帮助商家对产品有更好的了解和改善。论文主要从以下几个方面展开研究工作。1.完成互联网商品情感分析的预处理工作。选取某电商网站上相关电子产品作为研究对象,通过数据堂下载商品的评论数据。对获取的评论数据进行文本的预处理工作,主要包括文本的中文分词、过滤、词性标注、数据清洗、数据分类等,为后续评论文本的情感分析做准备。2.选取最优词性特征并提出一种改进的基于正负相关比率的互信息特征选择方法。特征的选取对情感分类起着决定性的作用,选取合适的特征有利于提高情感分类的准确率。一方面,从词性特征考虑,针对不同的文本特征主要包括情感词、形容词、副词、动词以及情感语气词等,指出情感语气词对于情感分类具有较好的辅助作用,选取最优词性特征组合。另一方面,对于特征选择方法进行比较,指出传统互信息选择方法的不足之处,并提出一种改进的基于正负相关比率的互信息特征选择方法。通过实验表明本文提出的最优词性特征组合以及改进的互信息特征选择算法具有更好的分类性能。3.分析三支决策的理论并提出一种多决策加权混合分类器。三支决策在处理不确定性问题时具有更好的表现。基于三支决策思想本文提出了一种多决策加权混合分类器,给出其主要思想、相关规则及定义。分别使用朴素贝叶斯分类器和支持向量机分类器,设置各个分类器的最优阈值,进行两次三支决策,对于边界区域文本其分类由朴素贝叶斯分类器和支持向量机分类器概率加权投票决定。实验表明多决策加权混合分类器有助于提高情感分类的准确率,具有一定的优越性。
[Abstract]:With the rapid development of the Internet and e-commerce, more and more people are used to shopping online, at the same time, a large number of Internet goods and commodity review information for people to choose the appropriate, Goods with high performance-price ratio cause some troubles. Therefore, it is very important to conduct emotional analysis of Internet commodity comment information. In view of the emotional analysis carried out in the field of Internet commodities, the comment content of a certain online commodity is taken as a sample. By means of machine learning and other methods, we can automatically analyze their emotional tendency, and find out that people's opinions and attitudes toward this commodity are evaluated and disparaged. The research topic of this paper is the emotional analysis of Internet commodity comment information. The main purpose of this paper is to use computer technology to analyze the large scale comment text in the network product, and to find out its emotional tendency, while facilitating the consumers to choose the right product at the same time. It also helps the merchants to have a better understanding and improvement of the products. This paper mainly starts the research work from the following aspects. 1. Finish the preprocessing work of the Internet commodity emotion analysis. Select the related electronic products on a certain e-commerce website as the research object. Through the data hall download the commodity comment data. The text preprocessing work to the obtained comment data, mainly includes the text Chinese word segmentation, the filtering, the part of speech tagging, the data cleaning, the data classification and so on, To prepare for the emotional analysis of the following comment text. 2. To select the optimal part of speech feature and propose an improved mutual information feature selection method based on positive and negative correlation ratio. The selection of features plays a decisive role in emotion classification. On the one hand, considering the part of speech features, different text features mainly include affective words, adjectives, adverbs, verbs and emotional mood words, etc. It is pointed out that mood words have a good auxiliary effect on affective classification, and the optimal part of speech feature combination is selected. On the other hand, the comparison of feature selection methods is made, and the shortcomings of traditional mutual information selection methods are pointed out. An improved mutual information feature selection method based on positive and negative correlation ratio is proposed. The experiments show that the proposed optimal feature combination and the improved mutual information feature selection algorithm have better classification performance. 3. This paper analyzes the theory of three-branch decision making and proposes a multi-decision weighted hybrid classifier, which has better performance in dealing with uncertain problems. Based on the idea of three-branch decision making, a multi-decision weighted hybrid classifier is proposed in this paper. The main idea, relevant rules and definitions are given. Using naive Bayesian classifier and support vector machine classifier, the optimal threshold of each classifier is set, and the decision is made two times and three times. The classification of the text in the boundary region is decided by the naive Bayesian classifier and the support vector machine classifier probability weighted voting. The experiments show that the multi-decision weighted hybrid classifier can improve the accuracy of emotion classification and has some advantages.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1
【参考文献】
相关期刊论文 前9条
1 周哲;商琳;;一种基于动态词典和三支决策的情感分析方法[J];山东大学学报(工学版);2015年01期
2 杨立公;朱俭;汤世平;;文本情感分析综述[J];计算机应用;2013年06期
3 张靖;金浩;;汉语词语情感倾向自动判断研究[J];计算机工程;2010年23期
4 赵妍妍;秦兵;刘挺;;文本情感分析[J];软件学报;2010年08期
5 张紫琼;叶强;李一军;;互联网商品评论情感分析研究综述[J];管理科学学报;2010年06期
6 柳位平;朱艳辉;栗春亮;向华政;文志强;;中文基础情感词词典构建方法研究[J];计算机应用;2009年10期
7 徐军;丁宇新;王晓龙;;使用机器学习方法进行新闻的情感自动分类[J];中文信息学报;2007年06期
8 徐琳宏;林鸿飞;杨志豪;;基于语义理解的文本倾向性识别机制[J];中文信息学报;2007年01期
9 朱嫣岚;闵锦;周雅倩;黄萱菁;吴立德;;基于HowNet的词汇语义倾向计算[J];中文信息学报;2006年01期
,本文编号:1553221
本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/1553221.html