基于SVM和概率神经网络多特征组合的在线产品评论情感信息挖掘
本文选题:SVM 切入点:概率神经网络 出处:《江苏大学》2017年硕士论文 论文类型:学位论文
【摘要】:随着互联网的普及和电商技术的快速发展,人们越来越喜欢网上购物。相比与线下购物,网购具有便携性,节省时间成本,受时间和空间的影响较小等特性。消费者在网上购买商品前一般会浏览商品下方的评论信息,在购买商品后,发表对商品或服务的评价。在线产品评论的出现使得企业改进产品质量的时间点也发生了变化。传统工业工程领域,企业改变产品质量的时间点是在产品离开生产线之前,现在,企业可以在用户使用产品之后,得到用户对产品的反馈信息,或者在产品制造之前,提前了解用户的真实需求,从而帮助企业理解消费者,改善产品质量。相比一些学者使用机器学习的方法来计算产品特征的情感值,本文更加关注文本评论的情感倾向,即识别文本所属的情感类别,是正向的情感还是负向的情感。本文所处理的评论级别是子句级,最终使用SVM和概率神经网络两种方法来识别子句的情感倾向,并比较结果。然后使用概率神经网络方法来预测子句的情感倾向,提取子句的产品属性,进行分类,得到消费者在各产品属性分类上情感分布情况。首先,以亚马逊网站上华为honor畅玩版4X手机为例,设定其在线产品评论数据抓取规则,然后使用八爪鱼采集器抓取在线评论数据。对抓取的数据进行向量化处理。识别每条评论中的有效子句,对有效子句进行分词、去掉停用词等预处理操作。根据相应的词典提取子句中情感词、否定词、程度副词和特殊符号等特征。然后,根据以上特征组合构建文本向量,使用SVM和概率神经网络两种方法来来建模,并验证模型的表现性能,判断概率神经网络是否可以用于文本情感识别。每种方法中,根据特征的不同组合,又分为五组实验,通过不同的实验组合,根据实验结果分析特征对文本情感识别的作用。最后,实验结果表明:子句中情感词数量和否定词数量对文本的情感识别作用很强,而程度副词和特殊符号的作用比较微弱;其次,从模型的准确度和运行时间两方面来分析,概率神经网络方法可以用于文本情感识别。接着,选用概率神经网络模型对实验数据进行分类预测,提取子句的产品属性,对其进行分类,得到消费者在各产品属性分类上情感分布情况,得到实验结果表明:该手机在相机和屏幕两个方面表现较差,企业可以在下代产品上改进这两方面。
[Abstract]:With the popularity of the Internet and the rapid development of e-commerce technology, people are more and more like online shopping. Compared with offline shopping, online shopping is portable and saves time cost. Less affected by time and space. Consumers generally browse the comments below the goods before buying them online, and after buying the goods, The appearance of online product reviews has also changed the point in which companies improve product quality. In traditional industrial engineering, the point in which companies change product quality is before the product leaves the production line. Now, enterprises can get feedback from users after they use the products, or they can understand the real needs of the users in advance before the products are manufactured, so as to help the enterprises understand the consumers. Improving product quality. Compared with some scholars using machine learning method to calculate the emotional value of product characteristics, this paper pays more attention to the emotional tendency of text review, that is, to identify the emotional category of text. The comment level is clause level, SVM and probabilistic neural network are used to identify the emotional tendency of clause. Then we use probabilistic neural network method to predict the emotional tendency of clauses, extract the product attributes of clauses, classify them, and get the distribution of consumers' emotions in the classification of product attributes. Take Huawei honor's 4X mobile phone on Amazon's website as an example, setting rules for its online product review data capture. Then we use the octopus collector to capture the online comment data. We vectorize the captured data. We identify the valid clauses in each comment, and segment the valid clauses. Remove preprocessing operations such as stop words. Extract features such as affective words, negative words, degree adverbs and special symbols in clauses according to the corresponding dictionaries. Then, construct text vectors according to the combination of the above features. SVM and probabilistic neural network are used to model the model, to verify the performance of the model, and to judge whether the probabilistic neural network can be used in text emotion recognition. In each method, according to the different combinations of features, it is divided into five groups of experiments. According to the experimental results, the effect of feature on text emotion recognition is analyzed through different experimental combinations. Finally, the experimental results show that the number of emotional words and the number of negative words in a clause have a strong effect on the emotional recognition of text. The function of degree adverb and special symbol is weak. Secondly, the probabilistic neural network method can be used in text emotion recognition from two aspects of model accuracy and running time. The probabilistic neural network model is used to classify and predict the experimental data, extract the product attributes of clauses, classify them, and obtain the distribution of consumer emotion in the classification of product attributes. The experimental results show that the performance of the mobile phone is poor in both camera and screen, and enterprises can improve these two aspects in the next generation of products.
【学位授予单位】:江苏大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP183;F713.36
【参考文献】
相关期刊论文 前10条
1 唐晓波;朱娟;杨丰华;;基于情感本体和kNN算法的在线评论情感分类研究[J];情报理论与实践;2016年06期
2 丁晟春;王颖;李霄;;基于SVM的中文微博情绪分析研究[J];情报资料工作;2016年03期
3 李湘东;刘康;丁丛;高凡;;基于《知网》的多种类型文献混合自动分类研究[J];现代图书情报技术;2016年02期
4 郭顺利;张向先;;面向中文图书评论的情感词典构建方法研究[J];现代图书情报技术;2016年02期
5 王冠群;田雪;黄德根;张婧;;中文微博观点句识别及要素抽取研究[J];数据采集与处理;2016年01期
6 王明文;付翠琴;徐凡;洪欢;;基于词项共现关系图模型的中文观点句识别研究[J];中文信息学报;2015年06期
7 黄挺;姬东鸿;;基于图模型和多分类器的微博情感倾向性分析[J];计算机工程;2015年04期
8 李光敏;许新山;熊旭辉;;Web文本情感分析研究综述[J];现代情报;2014年05期
9 李寿山;黄居仁;;基于Stacking组合分类方法的中文情感分类研究[J];中文信息学报;2010年05期
10 赵妍妍;秦兵;刘挺;;文本情感分析[J];软件学报;2010年08期
相关硕士学位论文 前2条
1 李杏杏;B2C网站商品评论挖掘技术的研究[D];北京交通大学;2014年
2 谭龙远;基于领域的网络爬虫技术的研究与实现[D];武汉理工大学;2009年
,本文编号:1644313
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1644313.html