基于深度学习的社交媒体文本立场分析研究

发布时间：2018-05-05 00:00

本文选题：立场分析 + 深度学习　；参考：《哈尔滨工业大学》2017年硕士论文

【摘要】：随着互联网技术的迅猛发展和智能终端的快速普及,越来越多的用户在社交媒体平台针对各类事件发表自己的立场和看法。用户针对具体对象和事件的立场态度对商业机构与政府机关决策具有重大的价值。传统情感分析只对文本表面的情感表达进行正负面分类,难以挖掘文本中用户针对特定事件话题的立场。因此,针对特定话题的社交媒体文本立场分析研究具有重要的科学研究价值和广泛的应用前景。现有的文本立场分析方法主要分为两类,分别是基于特征工程机器学习的方法和基于深度学习的方法。基于特征工程机器学习的立场分析方法需要构造和选择大量的特征,往往对语言学知识具有较高要求,同时经常受到训练样本不足导致的特征稀疏的影响。基于深度学习的方法往往直接将立场分析视为简单的文本分类问题,很少结合社交媒体文本词嵌入中的背景知识,也没有有效利用立场分析中特定话题的信息。针对以上问题,本文使用社交文本词嵌入作为背景知识,结合深层记忆网络的注意力机制,研究基于深度学习的社交媒体文本立场分析方法。本文首先在利用大规模社交文本预训练的词嵌入基础上,研究一种基于卷积神经网络的文本立场分析方法。在Sem Eval英文立场分析数据集和NLPCC中文立场分析数据集上的实验结果显示,该方法取得了Semeval数据集F值0.6752、NLPCC数据F值0.7036的成绩。在若干子话题上的性能超出评测最佳队伍,综合性能均列中英文两立场评测任务的第2位。同时,分析发现,相对于随机赋值等词嵌入初始化方式,社交媒体文本预训练词嵌入的加入能够有效提升模型的立场分析性能。针对现有研究往往对特定话题信息缺乏有效利用的问题,本文进一步提出一种利用深层记忆网络的注意力机制评估特定话题与文本成分关联关系的立场分析模型。该模型读取文本和话题的词嵌入表示,结合深层记忆网络的记忆机制和注意力机制,利用多个网络层叠加学习多层次的文本表示,分析得到文本对特定话题所持有的立场倾向。实验结果显示,该方法在Sem Eval数据集中的平均F值为0.6821,比该评测中表现最好的迁移学习模型提高了0.39%;在NLPCC数据集中的平均F值达到0.7140,较评测最佳模型提升了0.34%。该结果显示了本文提出的方法在社交媒体文本立场分析中的有效性。
[Abstract]:With the rapid development of Internet technology and the rapid popularity of intelligent terminals, more and more users on social media platforms to express their views on all kinds of events. The user's attitude towards specific objects and events is of great value to business organizations and government agencies. Traditional affective analysis only classifies the emotional expression on the surface of the text positively and negatively, so it is difficult to mine the user's position on the topic of a particular event in the text. Therefore, the research of social media text position analysis on specific topics has important scientific research value and wide application prospect. The existing text position analysis methods are mainly divided into two categories, one is based on feature engineering machine learning and the other is based on depth learning. The position analysis method based on feature engineering machine learning needs to construct and select a large number of features. It often requires a high level of linguistic knowledge and is often influenced by sparse features caused by insufficient training samples. The method based on in-depth learning often directly regards position analysis as a simple text classification problem, and seldom combines the background knowledge of social media text word embedding, and does not effectively utilize the information of a particular topic in position analysis. Aiming at the above problems, this paper uses social text word embedding as the background knowledge, combined with the attention mechanism of deep memory network, to study the social media text position analysis method based on deep learning. In this paper, we first study a method of text position analysis based on convolution neural network based on the word embedding of large-scale social text pretraining. The experimental results on the Sem Eval English position analysis data set and the NLPCC Chinese position analysis data set show that the method achieves the result of Semeval data set F 0.6752 and NLPCC data F 0.7036. Performance outperforms the best team on a number of sub-topics, and the comprehensive performance is ranked second in both English and Chinese positions. At the same time, it is found that the addition of pre-training words in social media text can effectively improve the performance of position analysis of the model compared with the initial method of word embedding such as random assignment. Aiming at the lack of effective use of information on specific topics, this paper proposes a position analysis model which uses the attention mechanism of deep memory networks to evaluate the relationship between specific topics and text components. The model reads the word-embedded representation of text and topic, combines the memory mechanism and attention mechanism of deep memory network, and uses multiple network layers to learn multi-level text representation, and analyzes the position tendency of text on a particular topic. The experimental results show that the average F value of this method in Sem Eval dataset is 0.6821, which is 0.39 higher than that of the best performance transfer learning model in this evaluation, and the average F value in NLPCC dataset is 0.7140, which is 0.34 points higher than that in the best evaluation model. The results show the effectiveness of the proposed method in social media text position analysis.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1

【参考文献】