B2C网站商品评论挖掘技术的研究
发布时间:2018-05-30 15:03
本文选题:商品评论 + 评论挖掘 ; 参考:《北京交通大学》2014年硕士论文
【摘要】:随着B2C市场规模的增大,消费者在互联网上对商品的评论数量也呈爆炸式增长。由于这些商品评论中隐藏许多对商家和消费者有价值的信息,因此准确高效地识别这些信息并加以利用会带来巨大的经济效益和广阔的应用前景,这使得商品评论的挖掘与分析成为近年来研究的热点。本文以大型B2C网站京东商城的手机评论为研究对象,对商品评论文本的情感分类和情感极性分析两方面进行了研究,主要工作如下: 使用支持向量机方法和朴素贝叶斯方法对商品评论文本的情感分类进行研究。首先对网上获取的评论进行人工选择获得训练集,然后利用NLPIR分词系统预处理语料,并用TF-IDF方法计算特征词的权重。最后,使用MI、IG、CHI特征选择方法在分类器SVM、NB上进行实验对比分析。实验结果表明,使用CHI特征提取方法,SVM和NB的分类效果能达到80%以上。另外,在同一特征提取方法上,SVM的分类效果要优于NB,正确率可到83%。 采用基于邻近原则的“双向迭代法”对商品评论文本进行细粒度情感极性分析。首先利用PMI-IR算法构建情感种子集,然后利用基于邻近原则的“双向迭代法”获取特征词-情感词关联关系对,以此提出了一种情感词典的构建方法,构建了一个基于HowNet的三元组情感词典Tri-HowNet,并且通过实验对比分析了基于HowNet极性词典与基于Tri-HowNet情感词典的两种极性判定方法。实验结果表明,后者在判定多语义情感词极性时表现优于前者。 设计并实现了基于SSH框架的评论挖掘系统。该系统主要包括词典维护、评论收集、评论分类、评论情感分析和可视化展示等5个模块。首先,利用开源:Java类库Crawler4j提供的接口,通过post模拟登陆的方法来获取评论。其次,由文本情感分类和情感分析两个方向出发,对商品评论进行研究分析。最后,将结果存入商品的分析库中,并能够以3D柱状图的形式展现,方便用户查询与使用。
[Abstract]:With the increase of B2C market scale, the number of consumers commenting on goods on the Internet is also increasing explosively. Because much valuable information is hidden in these commodity reviews, accurate and efficient identification and utilization of such information will bring great economic benefits and broad application prospects. This makes the mining and analysis of commodity reviews become the focus of research in recent years. This paper takes the mobile phone reviews of JingDong Mall, a large B2C website, as the research object, and studies the affective classification and the affective polarity analysis of the commodity review texts. The main work is as follows: Support vector machine (SVM) and naive Bayes method are used to study the emotion classification of commodity comment text. Firstly, the training set is obtained by manually selecting the comments obtained on the net, then the corpus is preprocessed by using the NLPIR word segmentation system, and the weight of the feature words is calculated by using the TF-IDF method. Finally, the feature selection method is used to compare and analyze the classifier SVMNB. The experimental results show that the classification effect of CHI and NB can reach more than 80%. In addition, the classification effect of SVM in the same feature extraction method is better than that of NB.The accuracy rate can reach 83%. A bidirectional iterative method based on proximity principle is used to analyze the fine-grained affective polarity of commodity review texts. Firstly, PMI-IR algorithm is used to construct the emotion seed set, then the "bidirectional iterative method" based on the proximity principle is used to obtain the associative pairs of feature words and affective words. A triple emotion dictionary Tri-HowNet based on HowNet is constructed, and two polarity determination methods based on HowNet polarity dictionary and Tri-HowNet emotion dictionary are compared and analyzed through experiments. The experimental results show that the latter performs better than the former in determining polarity of multi-semantic affective words. A comment mining system based on SSH framework is designed and implemented. The system mainly includes five modules: dictionary maintenance, comment collection, comment classification, comment emotion analysis and visual display. First of all, using the interface provided by the open source: Java class library Crawler4j, the method of simulating login by post is used to obtain comments. Secondly, from the two aspects of text emotion classification and emotion analysis, the article makes a research and analysis on commodity comment. Finally, the results are stored in the commodity analysis database, and can be displayed as 3D histogram, which is convenient for users to query and use.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1;TP393.092
【参考文献】
相关期刊论文 前5条
1 娄德成;姚天f ;;汉语句子语义极性分析和观点抽取方法的研究[J];计算机应用;2006年11期
2 唐慧丰;谭松波;程学旗;;基于监督学习的中文情感分类技术比较研究[J];中文信息学报;2007年06期
3 徐军;丁宇新;王晓龙;;使用机器学习方法进行新闻的情感自动分类[J];中文信息学报;2007年06期
4 郗亚辉;张明;袁方;王煜;;产品评论挖掘研究综述[J];山东大学学报(理学版);2011年05期
5 仇光;郑淼;卜佳俊;史源;陈纯;;基于传播的产品属性抽取[J];浙江大学学报(工学版);2010年11期
,本文编号:1955723
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1955723.html