基于深度信念网络的语音情感识别策略

发布时间：2018-11-16 15:58

【摘要】：近年来随着对情感计算不断地研究,语音情感识别得到了研究者们广泛的关注,它的实现对于推动心理学发展,构建更加和谐的人机环境起到非常重要的作用。语音情感识别是指通过提取语音中与情感相关联的特征参数,将这些特征参数组成特征向量,使用分类模型对特征向量进行计算,最终分析出情感类别。其中不断提高分类模型的识别性能一直是研究者们研究的重点。为了提高识别性能,本文提出了基于深度信念网络的语音情感识别策略,深度信念网络通过构建多隐层的人工神经网络,以此达到高效的特征学习能力,弥补了传统的神经网络在特征选择方面以及对于复杂函数的表示能力有限的缺点,提高了对于复杂分类问题的泛化能力,同时也降低了神经网络训练的收敛时间,最终使识别性能得到了提高。本文使用MATLAB实现了基于深度信念网络的语音情感识别策略,通过收集语音情感数据集,将该策略同基于BP神经网络分类模型的语音情感识别方法进行对比,分析召回率,准确率以及F1值三个指标。通过一系列实验显示,本文所提出的策略在平均召回率、平均准确率以及F1值均比BP神经网络要高。基于本策略,本文开发了一款移动语音情感识别系统原型,该系统原型采用C/S架构,客户端主要有录音、语音播放、上传语音以及结果显示等功能,服务器端主要有特征参数提取以及情感识别等功能。用户通过麦克风录取自己的语音,然后上传到服务器进行语音分析,服务器最终将情感识别结果返回给客户端。
[Abstract]:In recent years, with the continuous research on emotional computing, speech emotion recognition has been widely concerned by researchers. Its realization plays a very important role in promoting the development of psychology and building a more harmonious human-computer environment. Speech emotion recognition means that by extracting the feature parameters associated with emotion in speech, these feature parameters are formed into feature vectors, and then the feature vectors are calculated by classification model, and finally the emotion categories are analyzed. Among them, improving the recognition performance of classification models has been the focus of researchers. In order to improve the recognition performance, this paper proposes a speech emotion recognition strategy based on the deep belief network. The deep belief network constructs a multi-hidden layer artificial neural network to achieve an efficient feature learning ability. It makes up for the shortcomings of the traditional neural network in feature selection and the limited representation of complex functions, improves the generalization ability for complex classification problems, and reduces the convergence time of neural network training. Finally, the recognition performance is improved. This paper uses MATLAB to realize speech emotion recognition strategy based on deep belief network. By collecting speech emotion data set, the strategy is compared with speech emotion recognition method based on BP neural network classification model, and the recall rate is analyzed. The accuracy rate and F1 value are three indexes. A series of experiments show that the average recall rate, average accuracy rate and F1 value of the proposed strategy are higher than those of BP neural network. Based on this strategy, this paper develops a mobile speech emotion recognition system prototype, which uses C / S architecture. The client has the functions of recording, voice playing, uploading voice and displaying results. The server has the function of feature parameter extraction and emotion recognition. The user records his voice through microphone, then uploads it to the server for voice analysis, and the server finally returns the result of emotion recognition to the client.
【学位授予单位】：大连理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TN912.3

【参考文献】