当前位置:主页 > 经济论文 > 电子商务论文 >

基于主动学习和迁移学习的文本情感预测研究

发布时间:2018-04-23 04:31

  本文选题:主动学习 + 迁移学习 ; 参考:《山西大学》2016年硕士论文


【摘要】:随着新兴电子商务平台广泛使用,用户在享受便利的同时,也通过论坛发表关于产品的观点。通过这些评论,普通用户可以了解产品的性能,为购买行为做出理性的选择,生产者可以快速掌握市场动向,为商品营销做出正确的决策。因此,面向产品评论的观点挖掘和情感分析是解决此类问题的有效手段。传统的监督学习方法多应用于静态单领域数据,需要大量的标注数据,而迁移学习方法可以利用已有的标注数据来学习分类模型,用于解决训练目标样本标注不足的问题。由于不同领域或不同时期的数据之间存在一定的差异性,本文通过主动学习对分类模型进行优化,用于提高文本的情感预测效果,主要研究的内容如下:(1)文本情感预测的问题分析根据实验语料,从传统文本表示的局限性、评论文本语言表达的多样性以及评论文本不同时段的关注点不同三个方面,具体分析了目前情感分析研究中存在的问题,并提出了相应的解决方法。(2)基于主动学习和迁移学习的跨领域文本情感预测针对静态跨领域数据领域不同导致的语言表达多样性问题,提出一种基于主动学习和迁移学习的跨领域文本情感预测方法,首先通过源领域数据训练分类模型,选择目标领域置信度较高的文本作为分类模型的初始种子样本,迭代过程中,选取专家标注的低置信度文本与高置信度文本共同加入训练数据集,加快了目标领域分类模型的优化速度,再根据情感词典、评价词搭配抽取规则以及辅助特征词从训练集中动态抽取特征集,最终利用优化好的分类模型对测试数据集进行分类。相比Active-Dynamic,Active-Semi-Dynamic平均精度提高了 2.75个百分点,实验结果表明加入高置信度样本,能够丰富训练样本和特征信息,有助于分类模型的训练。相比Active-BOW,Active-Semi-Dynamic平均精度提高了 2.79个百分点,实验结果表明利用情感词典和依存句法分析相结合抽取情感词,能够更加准确地刻画文本的情感信息,提高跨领域文本的情感预测效果。(3)基于主动学习和迁移学习的时序评论情感预测针对动态时序数据评论时间不同导致的评论关注点不同问题,提出一种基于主动学习和迁移学习的时序评论情感预测方法,采用迁移学习思想,通过前一时期标注数据获得当前时期数据的初始标注样本。在主动学习中,采用SMOTE算法平衡训练数据集,通过优化后分类模型预测当前时期汽车评论的情感倾向。相比UN_SMOTE,SMOTE算法的平均准确率提高了 4.32个百分点,实验结果表明分类模型优化过程中,在少数类中插入新样本,能够平衡训练语料,提升汽车评论的情感预测效果,同时,实现了混合类评论的情感预测。
[Abstract]:With the widespread use of the emerging e-commerce platform, users enjoy the convenience, while the views of the products are published in the forum. Through these comments, ordinary users can understand the performance of the products, make a rational choice for the purchase behavior, and the producers can quickly grasp the market trend and make the correct decision for the marketing of goods. Therefore, View mining and emotional analysis for product reviews is an effective means to solve such problems. The traditional supervised learning method is applied to static single domain data and requires a large number of annotation data. The migration learning method can use the existing annotation data to learn the classification model and solve the problem of the shortage of training target samples. Due to the difference in data between different fields and different periods, this paper optimizes the classification model by active learning to improve the emotional prediction effect of text. The main contents are as follows: (1) the analysis of text emotional prediction is based on the authentic corpus, the limitations of the traditional text representation and the comment text. The diversity of language expression and the different points of attention in different periods of review text are three different aspects, and the existing problems in the present emotional analysis are analyzed, and the corresponding solutions are put forward. (2) a cross domain text emotion prediction needle based on active learning and migration learning has a different language in the static cross domain data field. To express the problem of diversity, a cross domain text emotion prediction method based on active learning and migration learning is proposed. First, the classification model is trained by the source domain data, and the text of higher confidence in the target domain is selected as the initial seed sample of the classification model. In the iterative process, the low confidence text and high confidence of the expert tagging are selected. The degree text joins the training data set together to speed up the optimization speed of the target domain classification model, and then according to the affective dictionary, the evaluation of the word collocation extraction rules and the auxiliary feature words from the training set dynamic extraction of the feature set. Finally, the optimized classification model is used to classify the test data sets. Compared with Active-Dynamic, Active-Semi-Dynami The average accuracy of C is increased by 2.75 percentage points. The experimental results show that adding high confidence samples can enrich the training samples and feature information and help the training of classification models. Compared with Active-BOW, the average precision of Active-Semi-Dynamic is increased by 2.79 percentage points. The experimental results show that the combination of emotional dictionary and dependency syntactic analysis is used to draw the combination of the emotional dictionary and the dependency syntactic analysis. Emotional words can be used to describe the emotional information of the text more accurately and improve the emotional prediction effect of the cross domain text. (3) a time series review emotional prediction based on active learning and migration learning is based on the different problems of critical attention caused by the different time of dynamic time series data commentary, and a time based on active learning and migration learning is proposed. In the active learning, the SMOTE algorithm is used to balance the training data set and to predict the emotional tendencies of the current period car reviews by optimizing the classification model. Compared with the average UN_SMOTE, the average accuracy of the SMOTE algorithm is compared. The accuracy of the experiment is increased by 4.32 percentage points. The experimental results show that in the optimization process of the classification model, new samples are inserted in a few classes, which can balance the training corpus, improve the emotional prediction effect of the car reviews, and realize the emotional prediction of the mixed class reviews.

【学位授予单位】:山西大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1

【参考文献】

相关期刊论文 前10条

1 唐超;王文剑;李伟;李国斌;曹峰;;基于多学习器协同训练模型的人体行为识别方法[J];软件学报;2015年11期

2 赵传君;王素格;李德玉;李欣;;基于分组提升集成的跨领域文本情感分类[J];计算机研究与发展;2015年03期

3 姜高霞;王文剑;;时序数据曲线排齐的相关性分析方法[J];软件学报;2014年09期

4 张玉红;周全;胡学钢;;面向跨领域情感分类的特征选择方法[J];模式识别与人工智能;2013年11期

5 魏现辉;张绍武;杨亮;林鸿飞;;基于加权SimRank的跨领域文本情感倾向性分析[J];模式识别与人工智能;2013年11期

6 吕云云;李e,

本文编号:1790476


资料下载
论文发表

本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/1790476.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户03283***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com