中文短文本跨领域情感分类算法研究

发布时间：2018-07-04 14:59

本文选题：情感分类 + 跨领域　；参考：《重庆大学》2016年硕士论文

【摘要】：随着电子商务的快速发展和微博、微信等的崛起,互联网上的短文本评论呈指数形式地增长,这些评论信息的背后蕴藏着巨大的经济和社会价值。传统的手工处理方法变得越来越困难,如何自动化地挖掘这些评论中的有用信息是自然语言处理领域的一个研究热点。文本情感分类技术应运而生,而跨领域情感分类由于不需要目标领域标记评论,实用性更强。情感分类作为一种主观的文本挖掘技术,其目的是判断评论者对某实体(产品、服务、事件等)的情感倾向和评价态度(正面或负面、推荐或不推荐等)。在对现有情感分类算法和相关技术进行了深入的研究基础上,提出了自己的跨领域情感分类算法。主要研究成果如下:(1)提出了基于情感敏感性词库(Sentiment Sensitive Thesaurus,SST)的跨领域情感分类算法。针对跨领域分类中原始领域()和目标领域()的领域独立性问题,提出构建SST词库,然后利用SST词库对原始领域和目标领域的评论集进行特征向量扩展,最后利用扩展之后的评论集进行分类器训练和分类预测。SST是在和的评论集上构建的,同时包含两类领域的特征。该算法利用支持向量机(SVM)对扩展之后的原始领域评论集进行分类器的训练,所得分类器对扩展之后的目标领域评论集进行分类预测。通过在酒店、电脑和书籍三个领域的数据集上进行9组实验表明,基于SST的跨领域分类算法分类效果较好。论文还对算法中的参数K和训练集大小对分类器分类效果的影响进行了实验探讨。(2)提出了投票集成的跨领域情感分类算法。利用集成学习的思想组合多个基分类器的结果来提升分类器分类效果。实验中采用了简单投票和加权投票两种方式,同样在酒店、电脑和书籍三个语料库上进行实验,结果表明投票集成分类算法分类效果明显优于单个基分类器的分类效果。(3)改进的Stacking集成分类算法。算法利用无监督的NTUSD情感词典分类方法,先对目标领域评论集进行分类,将其中部分情感极性较强的评论进行标记后加入到原始领域的评论集中,扩展训练集的构成,减小领域差异性。通过这种方式改进Stacking算法在跨领域分类中的实际应用效果。实验结果表明,Stacking集成分类算法能获得较好的分类效果,集成学习在跨领域情感分类中的应用具有研究价值。
[Abstract]:With the rapid development of electronic commerce and the rise of Weibo and WeChat, the short text reviews on the Internet have increased exponentially. Behind these comments, there are enormous economic and social values. Traditional manual processing methods are becoming more and more difficult. How to automatically mine useful information from these comments is a research hotspot in the field of natural language processing. The technology of text emotion classification emerges as the times require, and cross-domain emotion classification is more practical because it does not need target domain tagging comment. As a subjective text mining technique, emotion classification aims to judge the emotional tendency and evaluation attitude (positive or negative, recommendation or not) of the reviewer towards a certain entity (product, service, event, etc.). On the basis of deep research on the existing emotion classification algorithms and related technologies, this paper puts forward its own cross-domain emotion classification algorithm. The main results are as follows: (1) A cross-domain emotion classification algorithm based on sentiment sensitive Thesaurus (SST) is proposed. Aiming at the problem of domain independence of original domain () and target domain () in cross-domain classification, this paper proposes to construct SST lexicon, and then extends the comment set of original domain and target domain by using SST lexicon. Finally, the extended comment set is used for classifier training and classification prediction. The SST is constructed on the comment set of the sum and contains two kinds of domain features. Support vector machine (SVM) is used to train the original domain comment set, and the classifier is used to predict the extended target domain comment set. Nine groups of experiments on the data sets of hotel, computer and books show that the algorithm based on SST is effective. The effect of parameter K and training set size on classifier classification effect is also discussed experimentally. (2) A cross-domain emotion classification algorithm based on voting ensemble is proposed. Using the idea of integrated learning to combine the results of multiple base classifiers to improve the classifier classification effect. The experiment was conducted in two ways: simple voting and weighted voting. The experiments were also carried out on three corpora: hotel, computer and books. The results show that the classification effect of voting ensemble classifier is better than that of single base classifier. (3) improved Stacking ensemble classification algorithm. The algorithm uses the unsupervised NTUSD emotion dictionary classification method, classifies the target domain comment set first, marks some of the comments with strong affective polarity, then adds them to the original domain comment set to expand the composition of the training set. Reduce domain differences. In this way, the effect of Stacking algorithm in cross-domain classification is improved. The experimental results show that Stacking ensemble classification algorithm can achieve better classification effect, and the application of ensemble learning in cross-domain emotion classification is valuable.
【学位授予单位】：重庆大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.1

【参考文献】