网购用户评论中隐式评价对象的提取方法研究
发布时间:2019-03-16 11:20
【摘要】:在我国电子商务得到快速发展的同时,网购已经深入人们日常生活,由于信息的不对称性,使得消费者难以了解到商品的真实情况,而在线用户评论为用户的购买决策提供了参考意见,针对在线评论的意见挖掘也得到了广大学者的青睐。评价对象作为意见挖掘领域的一个方面,也得到了广泛的研究,而现有针对评价对象的研究主要集中在显式评价对象的研究,很少有学者将隐式评价对象纳入研究的考虑范围。在研究领域,对于学者来说,针对隐式评价对象的研究能够提高评价对象研究的准确率;对于企业来说,充分挖掘隐式评价对象,能够使企业关注到隐藏在消费者评论中的意见对象,更为全面地认识到消费者对产品各个方面的使用体验;对于消费者个人来说,电子商务平台通过对隐式评价对象的抽取,使得展示或推荐给用户的有效评论更加真实,消费者能够获得其他用户对商品各方面更加精确的评论意见。基于此,本文对用户评论中的隐式评价对象进行了挖掘研究,主要工作包括以下几方面:(1)数据预处理。通过数据抓取工具从淘宝网站上抓取用户评论的真实数据,然后对此文本数据进行分句、分词、特征选择、向量表示等处理。针对初始文本特征词空间维度较高的问题,采用基于模拟退火的粒子群优化算法对特征集进行二次特征提取,从而降低特征词空间维度。实验结果表明,采用该方法后,特征词空间维度由425维降低到296维,该方法能够进行有效的特征选择。(2)显式评价句的聚类分析。本文将评价句分为显式评价句和隐式评价句,并对显式评价句进行文本聚类研究。在用特征词对评价句进行表示后,得到的文本向量空间维度依然很高,因此,本文采用适用于高维数据集的FCM聚类算法。针对FCM算法容易陷入局部最优的特点,本文提出了基于模拟退火的FCM改进算法,通过对FCM算法迭代过程的控制,有效避免了算法陷入局部最优。通过实验将显式评价句聚为9类,给每个类别设定类别名称。实验结果表明,基于模拟退火的FCM改进算法能够对文本进行合理聚类。(3)隐式评价句评价对象提取。在对显式评价句进行文本聚类之后,将同类别评价句归为一个文档集。由于评价句的评价对象、评价词及类别之间存在某种映射关系,本文采用关联规则算法来挖掘不同文档集的关联规则,并建立类别、评价对象、评价词之间的关联规则表,在该表的基础上对隐式评价对象进行提取研究。通过对比实验验证,本文所提出的隐式评价对象提取方法的准确率达到75.26%,能够有效提高文本分类的准确率。
[Abstract]:With the rapid development of e-commerce in China, online shopping has gone deep into people's daily life. Because of the asymmetry of information, it is difficult for consumers to understand the real situation of goods. The online user comments provide a reference for the purchase decision of users, and the opinion mining of online comments has also been favored by the majority of scholars. As an aspect of opinion mining, evaluation object has also been extensively studied, and the existing research on evaluation object is mainly focused on explicit evaluation object, and few scholars take implicit evaluation object into consideration. In the research field, for the scholars, the research on implicit evaluation object can improve the accuracy of the evaluation object research; For enterprises, fully mining implicit evaluation objects can make enterprises pay attention to the opinion objects hidden in consumers' comments, and realize consumers' experience in all aspects of products more comprehensively. For consumers, by extracting implicit evaluation objects, e-commerce platform makes the effective comments displayed or recommended to users more realistic, and consumers can obtain more accurate comments from other users on all aspects of goods. Based on this, this paper has carried on the mining research to the implicit evaluation object in the user comment. The main work includes the following aspects: (1) data preprocessing. The real data of user comments is captured from Taobao website by data crawling tool, and then the text data is processed such as sentence segmentation, word segmentation, feature selection, vector representation and so on. In order to solve the problem of high spatial dimension of feature words in initial text, particle swarm optimization (PSO) algorithm based on simulated annealing is used to extract the second feature of feature set, so as to reduce the dimension of feature space. The experimental results show that the spatial dimension of feature words is reduced from 425 dimension to 296 dimension, and this method can be used to select features effectively. (2) clustering analysis of explicit evaluation sentences. In this paper, evaluation sentences are divided into explicit evaluation sentences and implicit evaluation sentences, and text clustering of explicit evaluation sentences is carried out. After the evaluation sentence is represented by feature words, the dimension of text vector space is still very high. Therefore, the FCM clustering algorithm suitable for high-dimensional data sets is adopted in this paper. In view of the characteristic that FCM algorithm is easy to fall into local optimization, this paper proposes an improved FCM algorithm based on simulated annealing. By controlling the iterative process of FCM algorithm, the algorithm can effectively avoid falling into local optimization. Through experiments, explicit evaluation sentences are grouped into 9 categories, and each category is given a category name. The experimental results show that the improved FCM algorithm based on simulated annealing can reasonably cluster the text. (3) implicit evaluation object extraction. After text clustering of explicit evaluation sentences, the same category evaluation sentences are classified into a document set. Because there is some mapping relationship among the evaluation object, the evaluation word and the category of the evaluation sentence, this paper uses the association rule algorithm to mine the association rules of different document sets, and establishes the association rules table among the categories, the evaluation objects and the evaluation words. On the basis of this table, the implicit evaluation objects are extracted. The experimental results show that the accuracy of the implicit evaluation object extraction method proposed in this paper is 75.26%, which can effectively improve the accuracy of text classification.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
本文编号:2441230
[Abstract]:With the rapid development of e-commerce in China, online shopping has gone deep into people's daily life. Because of the asymmetry of information, it is difficult for consumers to understand the real situation of goods. The online user comments provide a reference for the purchase decision of users, and the opinion mining of online comments has also been favored by the majority of scholars. As an aspect of opinion mining, evaluation object has also been extensively studied, and the existing research on evaluation object is mainly focused on explicit evaluation object, and few scholars take implicit evaluation object into consideration. In the research field, for the scholars, the research on implicit evaluation object can improve the accuracy of the evaluation object research; For enterprises, fully mining implicit evaluation objects can make enterprises pay attention to the opinion objects hidden in consumers' comments, and realize consumers' experience in all aspects of products more comprehensively. For consumers, by extracting implicit evaluation objects, e-commerce platform makes the effective comments displayed or recommended to users more realistic, and consumers can obtain more accurate comments from other users on all aspects of goods. Based on this, this paper has carried on the mining research to the implicit evaluation object in the user comment. The main work includes the following aspects: (1) data preprocessing. The real data of user comments is captured from Taobao website by data crawling tool, and then the text data is processed such as sentence segmentation, word segmentation, feature selection, vector representation and so on. In order to solve the problem of high spatial dimension of feature words in initial text, particle swarm optimization (PSO) algorithm based on simulated annealing is used to extract the second feature of feature set, so as to reduce the dimension of feature space. The experimental results show that the spatial dimension of feature words is reduced from 425 dimension to 296 dimension, and this method can be used to select features effectively. (2) clustering analysis of explicit evaluation sentences. In this paper, evaluation sentences are divided into explicit evaluation sentences and implicit evaluation sentences, and text clustering of explicit evaluation sentences is carried out. After the evaluation sentence is represented by feature words, the dimension of text vector space is still very high. Therefore, the FCM clustering algorithm suitable for high-dimensional data sets is adopted in this paper. In view of the characteristic that FCM algorithm is easy to fall into local optimization, this paper proposes an improved FCM algorithm based on simulated annealing. By controlling the iterative process of FCM algorithm, the algorithm can effectively avoid falling into local optimization. Through experiments, explicit evaluation sentences are grouped into 9 categories, and each category is given a category name. The experimental results show that the improved FCM algorithm based on simulated annealing can reasonably cluster the text. (3) implicit evaluation object extraction. After text clustering of explicit evaluation sentences, the same category evaluation sentences are classified into a document set. Because there is some mapping relationship among the evaluation object, the evaluation word and the category of the evaluation sentence, this paper uses the association rule algorithm to mine the association rules of different document sets, and establishes the association rules table among the categories, the evaluation objects and the evaluation words. On the basis of this table, the implicit evaluation objects are extracted. The experimental results show that the accuracy of the implicit evaluation object extraction method proposed in this paper is 75.26%, which can effectively improve the accuracy of text classification.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【引证文献】
相关期刊论文 前1条
1 韩忠明;李梦琪;刘雯;张梦玫;段大高;于重重;;网络评论方面级观点挖掘方法研究综述[J];软件学报;2018年02期
,本文编号:2441230
本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/2441230.html