基于半监督协同训练的文本情感分类研究

发布时间：2019-02-19 12:17

【摘要】：随着Web2.0的快速发展,互联网上产生了大量的用户生成内容(User Generated Content)。这些用户生成内容包含大量有用的情感信息,对于用户决策和企业的产品改进等有着重要的价值。因此,如何使用文本情感分类技术对海量的用户生成内容中的情感信息进行挖掘,已经成为学术界和产业界的一个热点问题。目前虽然基于机器学习的文本情感分类方法已经取得较好的结果,但是在实际应用中获取有标记样本需要消耗大量的人力,相反获取未标记样本却十分容易。因此,如何利用少量有标记样本和大量未标记样本进行文本情感分类已成为一个亟待解决的问题。为此本研究将半监督协同训练方法引入到文本情感分类方法当中,用于解决文本情感分类中未标记样本的利用问题。首先,本研究分析了文本情感分类和半监督学习的研究现状,明确了当前的研究问题和未来的研究方向。其次,本研究对文本情感分类和半监督学习的基础理论进行了系统研究,分析了文本情感分类的主要任务、文本情感分类的主要方法,以及半监督学习的基本假设、半监督学习的有效性和半监督学习的主要方法等基础理论。然后,以此为基础,本研究对基于半监督协同训练的文本情感分类方法进行了研究。考虑到当前已有研究还较少关注数据分布对文本情感分类的影响,本研究从数据分布是否均衡两个角度,分别构建了数据分布均衡条件下基于IDSSL的文本情感分类模型,以及在数据分布非均衡数据条件下基于混合策略的文本情感分类模型。最后,本研究将基于半监督协同训练的文本情感分类方法引入到实际应用中,通过选取电子商务和医疗社交媒体两个实际应用场景,分别对两类基于半监督协同训练的文本情感分类方法的有效性进行了检验。实验结果表明,本研究提出的方法在不同数据分布情况下均取得了较好的结果,从而验证了本研究提出方法的有效性。通过本研究,一方面将半监督学习方法引入到文本情感分类问题中,拓展了文本情感分类和半监督学习的基础理论,并以此为基础构建了基于半监督协同训练的文本情感分类模型。另一方面,将基于半监督协同训练的文本情感分类模型应用于具体实际问题中,拓展了文本情感分类和半监督学习的应用范围。
[Abstract]:With the rapid development of Web2.0, a large number of user-generated content (User Generated Content). Have been generated on the Internet. These user-generated content contains a large amount of useful emotional information, which is of great value to user decision-making and product improvement in enterprises. Therefore, how to use text emotion classification technology to mine the emotional information in the massive user-generated content has become a hot issue in academia and industry. Although the text affective classification method based on machine learning has achieved good results, it takes a lot of manpower to obtain labeled samples in practical applications. On the contrary, it is very easy to obtain unlabeled samples. Therefore, how to use a small number of labeled samples and a large number of unlabeled samples for text affective classification has become an urgent problem. In order to solve the problem of using unlabeled samples in text affective classification, semi-supervised cooperative training method is introduced into text affective classification. Firstly, this study analyzes the current situation of text affective classification and semi-supervised learning, and clarifies the current research issues and future research directions. Secondly, this study systematically studies the basic theories of text emotion classification and semi-supervised learning, analyzes the main tasks of text emotion classification, the main methods of text emotion classification, and the basic assumptions of semi-supervised learning. The effectiveness of semi-supervised learning and the main methods of semi-supervised learning and other basic theories. Then, based on this, the text emotion classification method based on semi-supervised cooperative training is studied. Considering that the current research has paid little attention to the influence of data distribution on text affective classification, this study constructs the text emotional classification model based on IDSSL under the condition of data distribution equilibrium from the two angles of data distribution equilibrium or not. And the text emotion classification model based on mixed strategy under the condition of unbalanced data distribution. Finally, the text emotion classification method based on semi-supervised cooperative training is introduced into the practical application, and two practical application scenarios, e-commerce and medical social media, are selected. The validity of two kinds of text emotion classification methods based on semi-supervised cooperative training is tested. The experimental results show that the proposed method has better results under different data distribution conditions, thus validating the effectiveness of the proposed method. Through this research, on the one hand, the semi-supervised learning method is introduced into the text affective classification problem, which expands the basic theory of text affective classification and semi-supervised learning. Based on this, a text emotion classification model based on semi-supervised cooperative training is constructed. On the other hand, the text emotion classification model based on semi-supervised cooperative training is applied to practical problems, which extends the application of text emotion classification and semi-supervised learning.
【学位授予单位】：合肥工业大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP391.1;F724.6

【相似文献】