基于深度学习的虚假评论识别方法研究

发布时间：2018-06-09 21:14

本文选题：虚假评论识别 + 深度学习　；参考：《哈尔滨工业大学》2017年硕士论文

【摘要】：随着互联网和移动终端的发展,电子商务成为日常生活中不可或缺的部分,随之而来的是商品信息、用户评论数量的飞速增长。其中用户的评论在电子商务中起到至关重要的作用,因为在网络购物中用户将商品评论作为衡量商品质量的参考资源,会左右消费者的决定。所以出于利益商家会雇佣专业写手为自家商品撰写好评或为竞争对手撰写差评,对电子商务平台的生态发展造成了严重的影响。现有研究表明,人工识别这类虚假评论的水平较低。为了有效识别这类评论,一些学者使用基于浅层、显性语义特征的方法,取得了一定成果。而深度学习方法可以挖掘深层次的语义特征,在识别虚假评论任务上,本文将深度学习方法作为研究重点。本文研究内容归纳为以下四点:(1)基于传统模型方法的虚假评论识别。在传统方法模型中采用了四种分类器,针对虚假评论语料的特点,提出文本特征、情感倾向性特征、心理学特征、句法相关特征四大类特征。采用了多模型投票策略,实验效果超过基线方法。(2)半监督学习算法扩充语料。针对虚假评论语料匮乏的情况,首先利用爬虫程序爬取评论资源,然后采用一种半监督学习算法,根据少量已标注的语料,从爬虫数据集中抽取置信度高的点评加入语料库。(3)基于深度学习模型的虚假评论识别。在虚假评论识别任务上,使用词向量作为输入,在LSTM、双向LSTM与CNN模型上进行实验,并尝试上述模型的融合实验。实验结果表明CNN与LSTM混合模型的效果最佳,准确率较基线方法提升2个百分点。(4)融合Attention机制的虚假评论识别。本文实现了两种Attention机制,分别为前馈式注意力模型和基于上下文的注意力模型。注意力机制对句子中单词的重要性进行区分,利用注意力权重得到更加准确的句子表示。将注意力机制应用在LSTM模型和LSTM与CNN的混合模型,进一步提升了准确率。
[Abstract]:With the development of the Internet and mobile terminals, e-commerce has become an indispensable part of daily life, followed by the rapid increase in the number of commodity information and user comments. Users' comments play an important role in electronic commerce, because in online shopping, users use commodity reviews as a reference resource to measure the quality of goods, which will influence consumers' decisions. Therefore, for the benefit of the business will hire professional writers for their own products to write praise or for competitors to write bad comments, e-commerce platform ecological development has a serious impact. Existing studies show that the level of manual identification of such false comments is low. In order to identify this kind of comment effectively, some scholars have made some achievements by using the method based on shallow and dominant semantic features. However, the deep learning method can mine the deep semantic features. In the task of identifying false comments, this paper focuses on the depth learning method. The research content of this paper is summarized as follows: 4 points: 1) false comment recognition based on traditional model method. Four kinds of classifiers are used in the traditional method model. According to the characteristics of false comment corpus, the text feature, emotional tendency feature, psychological feature and syntactic correlation feature are proposed. A multi-model voting strategy is adopted, and the experimental results exceed the baseline method. 2) the semi-supervised learning algorithm expands the corpus. In view of the shortage of false comment corpus, the crawler program is used to crawl the comment resource, and then a semi-supervised learning algorithm is adopted, according to a small number of tagged corpus. Extracted from reptile data sets, comments with high confidence are added to Corpus. 3) false comment recognition based on depth learning model. In the task of false comment recognition, we use word vector as input, experiment on LSTM, bidirectional LSTM and CNN model, and try the fusion experiment of the above model. The experimental results show that the mixed model of CNN and LSTM has the best effect, and the accuracy is 2% higher than the baseline method. In this paper, two kinds of Attention mechanisms are implemented, which are feedforward attention model and context-based attention model. The attention mechanism distinguishes the importance of the words in a sentence and uses the attention weight to obtain a more accurate sentence representation. The attention mechanism is applied to the LSTM model and the mixed model of LSTM and CNN, which further improves the accuracy.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1

【相似文献】