基于多元特征融合和LSTM神经网络的中文评论情感分析

发布时间：2017-12-26 22:30

本文关键词：基于多元特征融合和LSTM神经网络的中文评论情感分析　出处：《太原理工大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着移动互联网的飞速发展,网购成了人们日常生活的一部分。电商网站上存在大量的产品评论信息。挖掘这些评论的情感倾向不仅可以为商家提供商品的各种信息,方便商家做出销售决策,也有利于买家对商品做出客观判断,从而做出购买决策。面对数量庞大的评论文本信息,仅靠人工浏览去获取评论的情感倾向费时且费力,如何利用人工智能领域中的相关技术对产品评论自动化地进行情感分析成为了一个重要且有意义的课题。现有的情感分析方法主要有基于规则的方法、基于机器学习的方法和基于深度神经网络的方法,随着大数据技术的发展以及语言的形式越来越多元化,深度神经网络技术成为了自然语言处理领域的主流技术,在情感分析领域也取得了很大的突破,本文主要研究基于深度神经网络的情感分析方法。本文的主要研究工作如下:(1)针对文本情感分析中对文本表示时遇到的维度过高和语义不相关的问题,本文采用word embedding机制,通过神经网络语言模型对大量评论文本进行训练,并在此基础上通过distributed representation的方式表示文本,从而将文本映射为一个低维实数向量。这种文本表示方法同时也可以表达文本的语义信息,有助于神经网络模型对文本更好地理解。(2)针对情感分析任务的特殊性,本文提出了一种新的文本表示方法-——多元特征词向量。这种表示方法是对distributed representation表示方法的优化。考虑到情感分析中含有情感要素的词对文本整体情感极性的影响,通过构建情感要素词典捕捉文本中含有情感要素的词,并通过构造词的情感特征向量来表达词的情感要素,接着与用distributed representation方式表示的词向量进行特征融合构成多元特征词向量。用多元特征词向量表示的文本不仅含有文本的语义信息,而且可以捕捉文本的情感特征,更适合情感分析任务。(3)情感分析的本质是一个分类问题,计算特征权重是分类问题的重要步骤,基于此理论,本文在提出的多元特征词向量的基础上,进一步对其优化,借鉴特征权重算法为多元特征词向量分配权重,从而突出对分类更重要的词。本文提出的基于权重分配的多元特征词向量的文本表示方法对传统的文本表示方法从两方面进行了改进,丰富了对文本语义的表达,将其作为神经网络分类模型的输入,更适合神经网络模型对文本进行深层次特征捕捉与情感分类。(4)本文使用LSTM神经网络模型挖掘文本的深层特征。用基于权重分配的多元特征词向量表示文本,并作为LSTM神经网络模型的输入,然后利用LSTM神经网络能够学习文本中远距离依赖的特性捕捉文本的序列特征及上下文的依赖关系。最后本文通过和传统的基于LSTM神经网络的情感分析方法做对比实验,验证本文提出的改进方案的有效性。在上述四个工作中,本文充分考虑情感分析任务的特性,将情感词典资源以及特征权重信息等先验知识引入神经网络模型,在此基础上提出的基于权重分配的多元特征词向量可以捕捉更适用于情感分析任务的特征,利用LSTM神经网络模型的特性可以捕捉更丰富的特征组合,从而有效提高情感分类模型对文本的理解以及情感分类的准确率。
[Abstract]:With the rapid development of mobile Internet, online shopping has become a part of people's daily life. There is a lot of product comment information on the e-commerce website. Mining these reviews' emotional tendencies can not only provide businesses with various kinds of information, facilitate businesses to make sales decisions, but also help buyers make objective judgments on goods, so as to make purchase decisions. Faced with the huge amount of comment text information, it is time-consuming and laborious to get emotional sentiment of reviews only by manual browsing. How to make use of the related technology in artificial intelligence to automatically analyze the product reviews has become an important and meaningful topic. The emotion of the existing analysis methods are mainly based on rule based methods, machine learning method and method based on the depth of the neural network, with the development of big data technology and language in the form of more and more diversified, the depth of the neural network technology has become a mainstream technology in the field of Natural Language Processing, in the field of sentiment analysis has made great breakthrough in this paper study on the depth of the neural network analysis method based on emotion. The main research work of this paper is as follows: (1) according to the related encountered problems of high dimension and semantic representation for text sentiment analysis, this paper uses the word embedding mechanism, the neural network model of language training on a large number of comments and text, on the basis of the distributed representation representation of the text, which maps text as a low dimensional real vector. This method of text representation can also express the semantic information of the text, which helps the neural network model to understand the text better. (2) in view of the particularity of the emotional analysis task, a new method of text representation, multi feature word vector, is proposed in this paper. This representation is an optimization of the distributed representation representation method. Considering the effect of sentiment analysis contains the emotional factors of the word polarity on the whole text by emotion, emotional factors to construct the emotional elements containing the words in the dictionary to capture the text, and through the emotional feature vector to construct the word to express emotion words, then the features are fused to form multiple feature vectors and vector expressed by distributed representation the way. The text expressed with multiple feature words not only contains the semantic information of the text, but also can capture the emotional features of the text, which is more suitable for the emotional analysis task. (3) the nature of sentiment analysis is a classification problem, feature weight calculation is an important step in the classification problem, based on this theory, based on the characteristics of multi word vector proposed on the further optimization, using feature weighting algorithm for multi feature vector weights, which are more important to the classification of words. This text multi term vector representation method based on the weight distribution on the traditional text representation methods are improved from two aspects, enrich the expression of the text, as the neural network classification model is more suitable for the input of the neural network model of the deep features capture and sentiment classification. (4) this paper uses the LSTM neural network model to excavate the deep features of the text. We use weight based multi feature vector to represent text and use it as input of LSTM neural network model. Then we use LSTM neural network to learn the characteristics of long distance dependency in text, and capture the sequence characteristics and Contextual Dependency of text. Finally, this paper compares the traditional LSTM neural network based affective analysis method to verify the effectiveness of the proposed scheme. In the four work, considering the characteristics of sentiment analysis tasks, emotional dictionary resources and feature weight information such as prior knowledge into the neural network model is proposed based on multiple feature vector based on weight distribution can capture more suitable features in sentiment analysis tasks, using the characteristics of LSTM neural network model you can capture the feature combination more abundant, so as to effectively improve the accuracy of text classification model of understanding emotion and sentiment classification.
【学位授予单位】：太原理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1;TP18

【参考文献】