基于深度学习的人脸面部情感识别的研究
发布时间:2018-03-15 23:25
本文选题:表情识别 切入点:特征 出处:《哈尔滨工业大学》2017年硕士论文 论文类型:学位论文
【摘要】:随着人工智能的发展,情绪识别的应用场景越来越广泛,典型的有广告效果评估、产品评测、视频分析、医疗康复、安全驾驶以及情感机器人等。目前,情绪识别在人机交互领域发展特别快,尤其是在安全驾驶、情感机器人应用上,让机器更好的理解人、更加智能和人性化的为人类服务是近期人工智能革命的根本。机器逐渐学习到足够的情感认知能力以后,就可以在人机交互中对用户体验进行一系列升级,最终,使机器能像普通人一样融入人类生活。情绪识别广义上可以通过表情、语音语调或者脑电捕捉等进行。目前技术上最成熟、得到广泛应用的是表情识别技术,也就是基于计算机视觉算法,识别人脸的表情动作和推断喜怒哀乐等基本情绪。因为不同人表达感情程度存在偏差,自动面部表情识别(Facial Expression Recognition,FER)在计算机视觉中仍然是一个具有挑战性和有趣的问题。尽管在开发用于FER的各种方法方面做出了努力,但是当处理未标注的或在自然环境中捕获的那些图片时,现有的方法缺乏普适性。大多数现有方法基于手工特征(例如梯度直方图,局部二值模式和Gabor特征描述算子),然后结合分类器(如支持向量机),其中分类器的超参数被优化以在单个数据库或类似数据库的小集合中给出最佳识别精度。不同特征描述算子对不同背景下的表情图像的表征能力存在偏差,必须针对特定背景图像找到最合适的特征描述算子,这大大增加了工作复杂度。而深度学习可以自动学习面部特征,并且属于端到端模型,即特征学习和分类在一个模型下完成。本文基于谷歌提出的inception结构提出了一个深层神经网络架构,以解决在不同背景图像需要寻找不同特征描述算子的问题,并精简模型使之能够成功应用到移动端。具体来说,我们的网络由两个卷积层组成,每个层之后是最大池,紧接着是三个inception层。网络是单个组件架构,其将注册的面部图像作为输入并将其分类为六个基本表情或中性表情中的任一个。本文对七个公开可用的面部表情数据库(Multi PIE、MMI、CK+、DISFA、FERA、SFEW和FER2013)进行了全面的实验。主要对比分析了基于传统特征的学习方法和基于深度学习方法在不同数据库上的泛化能力,实验表明基于深度学习的方法泛化能力要好于基于传统特征的学习方法;此外,还与目前主流的模型诸如VGG、Google Net、Res Net等模型在表情识别任务上做了对比进一步说明了基于inception的结构在保证表情识别准确率的前提下,可以尽量精简模型大小。
[Abstract]:With the development of artificial intelligence, the application of emotion recognition is becoming more and more extensive, typical of which are advertising effect evaluation, product evaluation, video analysis, medical rehabilitation, safe driving and emotional robot. Emotional recognition is developing very quickly in the field of human-computer interaction, especially in the field of safe driving and affective robot applications, so that machines can understand people better. A more intelligent and humane service to humanity is fundamental to the recent revolution in artificial intelligence. After learning enough emotional cognitive abilities, machines can upgrade the user experience in human-computer interaction, eventually. So that machines can be integrated into human life like ordinary people. In a broad sense, emotion recognition can be done through facial expression, voice and intonation, or EEG capture, etc. At present, it is the most mature technology and is widely used in facial expression recognition. That is, based on computer vision algorithms that recognize facial movements and infer basic emotions, such as emotions, emotions, emotions, etc., because there is a bias in how different people express their feelings. Automatic facial expression recognition facial Expression recognition fer remains a challenging and interesting problem in computer vision, despite efforts to develop methods for FER, However, when dealing with unlabeled or captured images in the natural environment, the existing methods lack universality. Most of the existing methods are based on manual features (such as gradient histograms, for example, gradient histograms). The local binary pattern and the Gabor feature description operator are then combined with classifiers (such as support vector machines), where the classifier's superparameters are optimized to give the best recognition accuracy in a single database or a small set of similar databases. The same feature description operator deviates from the representation ability of facial expression images in different backgrounds. The most suitable feature description operator must be found for a particular background image, which greatly increases the complexity of the work. Depth learning can automatically learn facial features and belong to the end-to-end model. In this paper, based on the inception structure proposed by Google, a deep neural network architecture is proposed to solve the problem of finding different feature description operators in different background images. Specifically, our network consists of two convolution layers, each followed by a maximum pool, followed by three inception layers. The network is a single component architecture. It uses registered facial images as input and classifies them into any of the six basic or neutral expressions. Seven publicly available facial expression databases, Multi Piek / MMICK / DISFAA SFEW and FER2013, have been tested in this paper. Compared with the traditional feature based learning method and the depth based learning method's generalization ability on different databases, The experimental results show that the generalization ability of the method based on depth learning is better than that of the method based on traditional features. It is also compared with the current mainstream models such as VGGG, Net, and so on, on the task of facial expression recognition. It is further proved that the structure based on inception can reduce the size of the model under the premise of ensuring the accuracy of facial expression recognition.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TP181
【参考文献】
相关期刊论文 前3条
1 孙晓;潘汀;任福继;;基于ROI-KNN卷积神经网络的面部表情识别[J];自动化学报;2016年06期
2 卢官明;何嘉利;闫静杰;李海波;;一种用于人脸表情识别的卷积神经网络[J];南京邮电大学学报(自然科学版);2016年01期
3 王燕;张殷绮;;基于Gabor和二值叠加CS-LBP特征的人脸表情识别[J];计算机工程与应用;2015年19期
,本文编号:1617320
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1617320.html