基于上下文的维度情感识别方法研究

发布时间：2018-07-29 07:13

【摘要】：情感在人们日常交流中扮演着重要的角色,丰富的情感有助于说话人表达自己的思想。维度情感可以描述复杂微妙且连续的情感状态,它将不同的情感状态表征为一个连续的情感空间中不同的点。人类的情感表达是连续的、多模态的,因此,在维度情感识别中,基于上下文的情感识别方法越来越受到研究者的关注。现有的基于上下文的情感识别方法主要集中在情感特征上学习上下文信息,忽略了情感状态上下文信息的学习,且很少考虑模态之间的情感上下文。因此,本文主要通过情感时间上下文和情感模态上下文两个方面来研究上下文信息对维度情感识别的作用。情感时间上下文是指情感在表达过程中随时间变化的规律,包括情感特征和情感状态的连续变化,情感模态上下文是指多个模态之间所表现的情感信息的相互关联性。充分利用这两种上下文信息有助于提高维度情感识别准确率。具体研究内容如下:1)提出基于双向长短时记忆网络的层次情感时间上下文学习方法:该方法包含三个步骤。首先,对输入的低层特征通过前馈神经网络学习得到高层特征,这样可以消除低层特征的不稳定性,从而得到表征能力更好的高层特征。然后,在高层特征上通过双向长短时记忆网络学习情感特征序列的情感时间上下文信息,利用此信息对情感状态进行初步的识别。最后,通过无监督学习方法得到情感标签序列的情感时间上下文信息,利用此信息对上阶段得到初步识别结果做最终识别。本方法通过学习情感特征序列和情感标签序列的情感时间上下文信息,从而充分利用情感状态表达的连续性特点进行维度情感识别。在AVEC2015数据集上的实验结果表明,利用情感特征和情感标签两种情感时间上下文得到的识别结果要好于仅利用特征的情感时间上下文得到的识别结果。2)提出基于注意力模型的动态情感模态上下文学习方法:该方法包含两个步骤,首先采用上一方法分别基于视频与音频数据的情感时间上下文信息对维度情感状态进行初步识别,分别得到基于单模态的维度情感识别结果。然后,基于注意力模型进行情感模态上下文学习。情感模态上下文学习过程中,在每一时刻对每个模态数据通过注意力模型实时地计算出各自模态的注意力信号量,将该注意力信号量作为相应模态对情感识别的权重,进而动态地计算出当前时刻的模态上下文向量。最后将学习得到模态上下文向量输入到双向长短时记忆网络进行维度情感识别。本方法能够动态地学习情感模态上下文信息。在AVEC2015和RECOLA两个数据集上的实验结果表明,与基于单模态的识别方法相比,该方法能够提高识别准确性,而且通过注意力模型动态地学习情感模态上下文得到的识别结果好于传统的基于线性方法学习情感模态上下文得到的识别结果。3)设计并实现基于上下文的维度情感识别原型系统:采用PyQt实现了系统的图形操作界面,基于Python、Numpy、CUDA和Theano实现了系统的算法。原型系统包括数据处理、情感时间上下文学习、情感模态上下文学习三个模块。通过该原型系统的实现来验证本文所提方法的可用性。
[Abstract]:Emotion plays an important role in people's daily communication. The rich emotion helps the speaker express his thoughts. The dimension emotion can describe the complex and continuous emotional state. It characterizing different emotional states as different points in a continuous emotional space. Human emotion expression is continuous and multimodal. Therefore, in dimension emotion recognition, the context based emotion recognition method has attracted more and more attention. The existing context based emotion recognition methods mainly focus on the learning context information on the emotional features, ignore the learning of the emotional state context information, and seldom consider the emotional context between the modes. This paper studies the effect of context information on dimensional emotion recognition mainly through two aspects of emotional time context and emotional modal context. Emotional time context refers to the regularity of emotion in the process of expression, including the continuous transformation of emotional and emotional states, and emotional modal context refers to multiple modes. The interrelation between the emotional information expressed between them. The full use of these two contextual information helps to improve the accuracy of dimension emotion recognition. The specific research contents are as follows: 1) a hierarchical affective time context learning method based on the bidirectional long short memory network is proposed: this method contains three steps. First, the low layer characteristics of the input are passed. The feedforward neural network learns the high-level features so that the instability of the low layer features can be eliminated and the high-level features with better characterization are obtained. Then, the emotional time context information is learned from the emotional feature sequence by the bidirectional long short memory network on the high level feature, and the emotional state is identified by this information. Finally, the emotional time context information of the emotional label sequence is obtained by the unsupervised learning method. This information is used to identify the initial recognition results at the upper stage. This method can make full use of the continuity of emotional state expression by learning the emotional time context information of the emotional feature sequence and the emotional label sequence. The experimental results on the AVEC2015 data set show that the recognition results obtained from two emotional time contexts using emotional features and emotional labels are better than the recognition results that only use the emotional time context of the feature.2) to propose a dynamic affective modal context learning method based on attention mode: this method: The method consists of two steps. First, the first method is based on the emotional time context information of the video and audio data to identify the emotional state of the dimension, and the results of emotional recognition based on single mode are obtained respectively. Then, the emotional modal context learning is carried out based on the attention model. The emotional modal context has been studied. In the process, each modal data is calculated in real time by the attention model of each modal data, and the amount of attention signal is used as the weight of the corresponding modal for emotion recognition, and then the modal context vector of the current moment is calculated dynamically. This method can dynamically learn emotional modal context information. The experimental results on two data sets in AVEC2015 and RECOLA show that the method can improve the accuracy of recognition compared with the single mode based recognition method and learn the emotional mode dynamically through the attention model. The recognition result of context is better than the traditional recognition result.3 based on linear method learning emotional mode context.) design and implement a context based dimension emotion recognition prototype system: using PyQt to implement the system graphical interface, based on Python, Numpy, CUDA and Theano implementation of the system algorithm. Prototype system package It includes three modules, including data processing, emotional time context learning and emotional modal context learning. Through the implementation of the prototype system, the availability of the proposed method is verified.
【学位授予单位】：江苏大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP18

【参考文献】