基于关系的垃圾评论检测方法
发布时间:2018-03-29 02:02
本文选题:垃圾评论 切入点:诚实评论 出处:《南京邮电大学》2014年硕士论文
【摘要】:购物网站评论为消费者比较商品的质量、店家的服务等提供了有价值的信息。然而垃圾评论者通过发表虚假的、不公正的评论来误导消费者。现存许多基于评论者行为特征的垃圾评论检测方法,但这些方法对于有意模仿正常评论者行为的垃圾评论者是无法检测的。 垃圾评论检测工作之所以困难很大,是因为垃圾评论者可以轻松的发表与正常评论相似的评论,所以单单从评论或评论者出发的检测方法性能很低。本文首先通过层次分析法得到店家的可信度、评论的文本等特征得到评论的诚实度,然后根据评论关系图分析评论者、评论、店家的交互关系,最终使用Logistic进行分类。 评论关系图由三种类型的节点构成,评论者、评论、店家,分析得出三者的交互关系:评论者所发表的诚实的评论越多,他的可靠度就越高;店家所得到的来自可靠评论者的诚实正面评论越多,它的可信度也越高;评论和其它周围诚实评论的相似度越高,该评论的诚实度也就越高。这是在垃圾评论检测领域首次提出基于交互关系的检测方法,解决了检测信息量较少的局限。实验证明,本文提出的检测方法检测出的垃圾评论类型更复杂、更精细,同时在精确率、召回率等各项指标均有所改善,而且大大缩短了计算程序的运行时间。
[Abstract]:Shopping site reviews provide valuable information for consumers to compare the quality of goods, store services, and so on. Many existing spam review detection methods based on the behavior characteristics of commenters are not detectable for spam reviewers who are interested in imitating normal reviewers' behavior. The reason why spam reviews are difficult to detect is that spam reviewers can easily post comments similar to normal ones. Therefore, the performance of the detection method is very low. Firstly, the credibility of the shop owner is obtained by AHP, the honesty of the comment is obtained by the text of the comment, and then the reviewer is analyzed according to the comment relationship graph. Reviews, store interactions, and ultimately use Logistic for classification. The review diagram is composed of three types of nodes, the reviewer, the commentator, the shopkeeper, and the analysis shows that the more honest comments the reviewer makes, the higher his reliability; The more honest positive comments a store gets from reliable commentators, the higher its credibility; the higher the similarity between reviews and other honest comments around them, This is the first time an interactive detection method has been put forward in the field of spam review detection, which solves the limitation of less information. The detection method presented in this paper is more complex and more precise in detecting garbage comments. At the same time, the accuracy rate and recall rate are improved, and the running time of the calculation program is shortened greatly.
【学位授予单位】:南京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【相似文献】
相关硕士学位论文 前1条
1 王云;基于关系的垃圾评论检测方法[D];南京邮电大学;2014年
,本文编号:1679041
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1679041.html