跨语言文本情感分类技术研究

发布时间：2018-08-06 08:19

【摘要】：文本情感分类旨在通过计算机技术,对文本中表达的主观情感倾向性进行判断,通过充分挖掘和分析文本生产者的兴趣倾向和情感态度,为决策者提供有价值的重要参考信息。由于国内外有效的高质量分析语料、情感词典等分布不均,使得跨语言文本情感分类研究应运而生。跨语言文本情感分类是利用源语言的有标注语料,辅助目标语言进行情感倾向性分析,其核心问题是解决如何将源语言和目标语言转换到同一语言空间中。根据国内外不同语言空间的转换手段不同,可将其分为三类:利用双语词典、平行语料库建立两种语言的对应关系以及利用机器翻译技术等三种研究方案。本文对上述三种方案分别作了相应尝试,主要贡献包括以下几个方面:(1)提出了一种在主动学习框架下的单语言文本情感分析方法SLAB。该方法中的采样策略是在不确定性采样策略的基础上,使用情感词典,在选择最不确定的样本的同时,也选择情感分数较大的样本,弥补了不确定性采样策略的不足,从而达到提高分类器准确率的目的。应用上述主动学习中提出的采样策略实现一种跨语言文本情感分类方法AL-CLSC。该方法首先利用机器翻译技术,将英文文本翻译为中文,然后通过主动学习方法,主动选择“好的”训练样本,通过循环训练,最终实现一个较好的中文文本情感分类器。进一步地,本文结合图结构模型对所提出的方法AL-CLSC进行改进,提出GAL-CLSC方法,以期解决机器翻译训练语料时,可能造成的信息丢失、重复及偏差等问题。实验结果显示,在不同的训练集中,该改进方法对分类器的准确率确有明显提高。(2)考虑到近年来神经网络在文本情感分类任务中的突出表现,本文提出两种分别结合RNN和CNN的深度典型相关性跨语言文本情感分类方法DCCA-RNN和DCCA-CNN。该两种方法是利用平行语料,在深度典型相关性的理论基础上,通过RNN和CNN学习两种语言空间的非线性关系,在映射的共享特征空间中利用典型性相关实现跨语言文本情感分类。
[Abstract]:The purpose of text emotion classification is to judge the tendency of subjective emotion expressed in text by computer technology, and to provide valuable reference information for decision makers by fully mining and analyzing the interest tendency and emotional attitude of text producers. Due to the uneven distribution of effective high quality analytical corpus and emotion dictionary at home and abroad, cross-language text emotion classification research emerges as the times require. Cross-language text affective classification is to use tagged corpus of source language to assist target language in emotional orientation analysis. Its core problem is how to transform source language and target language into the same language space. It can be divided into three categories according to the different methods of language space conversion at home and abroad: making use of bilingual dictionaries, establishing the corresponding relations between two languages in parallel corpus, and using machine translation technology. The main contributions are as follows: (1) A single language text affective analysis method, SLAB, is proposed under the framework of active learning. In this method, the sampling strategy is based on the uncertain sampling strategy, using the emotion dictionary to select the most uncertain samples, and at the same time to select the samples with high emotional score, which makes up for the lack of the uncertain sampling strategy. In order to improve the accuracy of the classifier. A cross-language text affective classification method, AL-CLSCC, is implemented using the sampling strategy proposed in the above active learning. The method first uses machine translation technology to translate the English text into Chinese, then through the active learning method, chooses the "good" training sample actively, and finally realizes a better Chinese text emotion classifier by cyclic training. Furthermore, this paper improves the proposed method AL-CLSC by using graph structure model, and proposes a GAL-CLSC method to solve the problems of information loss, repetition and deviation caused by machine translation training corpus. The experimental results show that the improved method does improve the accuracy of classifier in different training concentration. (2) considering the prominent performance of neural network in text emotion classification task in recent years, In this paper, we propose two cross-language affective classification methods, DCCA-RNN and DCCA-CNN, which combine with RNN and CNN, respectively. The two methods are based on the theory of depth canonical correlation, using parallel corpus to learn the nonlinear relationship between the two languages by RNN and CNN. In the shared feature space of mapping, canonical correlation is used to achieve cross-language text affective classification.
【学位授予单位】：华侨大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1

【相似文献】