基于深度学习的暴力检测及人脸识别方法研究

发布时间：2018-03-04 15:21

本文选题：公共安全　切入点：智能视频分析　出处：《中国科学技术大学》2017年博士论文　论文类型：学位论文

【摘要】：随着"平安城市"建设的不断推进,公共安全逐渐成为人们关注的热点问题,视频监控技术也随之得到了越来越广泛的应用,传统的视频监控系统主要提供采集存储功能,这远远无法满足人们对其智能化的需求。要实现智能化的视频监控系统,以下几个关键问题亟需解决:(1)如何快速发现监控视频中的异常行为,及时给出警报,并最大限度地减少误报和漏报现象;(2)如何在多种不利因素下(如单样本,低分辨率)对可疑目标进行准确的识别分析;(3)在海量数据的情况下,如何确保视频分析系统的实时性及准确性。近年来,深度学习在机器视觉、语音识别和自然语言处理等多个领域都取得了优异的成绩,这也为智能视频分析技术的发展带来了新的契机。因此,本文基于深度学习的方法对上述相关问题展开研究,主要研究工作与创新如下:1.针对监控视频中的异常行为尤其是暴力打斗行为难以准确快速发现的问题,提出了一种基于三维卷积深度网络的暴力检测方法。该方法利用大量带标签的视频数据进行有监督的学习,通过将传统二维卷积核扩展为三维来提取视频中的运动信息,然后综合利用视频的空间信息及运动信息来构建深度神经网络模型,从而实现对监控视频中暴力打斗的检测。由于深层模型端到端学习的特性,所以不需要设计复杂的手工特征来描述运动信息,从而降低了任务的复杂度。实验结果表明,本文提出的方法在单一场景以及人群密集环境下都可以对暴力打斗行为进行准确识别。2.针对人脸图像在单训练样本下难以被准确识别的问题,提出了一种基于核主成分分析网络(Kerne1 Principle Component Analysis Networks,KPCANet)模型的二阶段投票人脸识别方法。该方法在不使用额外样本数据的情况下,利用非监督深层模型KPCANet对分块后的人脸图像进行训练并利用KPCA学习得到的滤波器进行特征提取,从而保证了提取的特征对光照及遮挡的鲁棒性,同时也消除了人脸局部形变对识别率的影响。本文通过投票的方法融合每一个分块的预测值来得到最后的识别结果,对于单次投票结果不唯一的情况,本文采取了二阶段的投票方法,通过扩大每一块的预测候选集,并对不同的区域赋予不同的权值来得出最后的结果,从而进一步提升了识别的准确率。实验结果表明,该方法在四个公开人脸数据集上都取得了优异的表现,算法准确率优于使用了额外数据集的通用方法,尤其是在非限制人脸数据集LFW-a上,本文提出的方法比SVDL和LGR方法准确率提升了约l5%。3.针对监控视频中人脸图像由于分辨率过低而无法准确识别的问题,提出了一种基于卷积神经网络模型的低分辨率人脸识别的解决方案。该方案提出了两种模型:多尺度输入的卷积神经网络(Convolutional Neural Networks,CNN)模型和基于空间金字塔池化(Spatial Pyramid Pooling,SPP)的CNN模型。(1)多尺度输入的CNN模型是对现有的"二步法"进行的改进,利用简单双三次插值方法对低分辨率图像进行上采样,再将上采样得到的图像与高分辨率图像混合作为模型训练样本,让CNN模型学习高低分辨率图像共同的特征空间,然后通过余弦距离来衡量特征相似度,最后给出识别结果。在CMU PIE和Extended Yale B数据集上的实验表明,模型的准确率要优于其他对比方法,相对于目前识别率最高的CMDA_BGE算法,准确率获得了 2.5%～9.9%的显著提升。(2)基于SPP的CNN模型,属于改进的"跨空间法",通过在CNN模型中加入空间金字塔池化层,使模型对于不同尺寸的输入图像都可以输出恒定维度的特征向量,最后通过比较样本库与测试图像的特征相似度就可以得到最后的识别结果。实验表明,相比多尺度输入的CNN模型,该方法在保持较高准确率的同时,省去了上采样的操作,简化了图像预处理的过程,同时也减少了传统"跨空间法"中需要学习的呋射函数的个数。4.针对监控系统中数据流传输带来的带宽占用问题以及对海量数据的快速准确分析需求,提出了一种基于"海云协同"的深度学习模型框架。海端系统利用深度学习的方法对本地数据进行训练得到局部模型,通过局部模型可以对数据进行快速检测,进而给出实时响应。海端系统通过上传局部模型和少量数据的方式协同云端训练,云端系统利用这些局部模型和数据构建更加复杂的深度模.型并进行调优,得到性能更好的全局模型。在MNIST、Cifar-10和LFW数据集上的实验表明,"海云协同"的方法有效地减少了数据传输的带宽消耗,同时也保证了海端的快速性和云端的精确性。上述方法已部分应用于中科院先导"海量网络数据流海云协同实时处理系统(XDA060112030)" 课题之中。
[Abstract]:With the "green city" construction, public security has gradually become the focus of people's attention, video surveillance technology has been more and more widely used, the traditional video monitoring system mainly provides the data storage function, which cannot meet the demand for intelligent. To realize intelligent video surveillance system to solve the key problems, the following: (1) how to quickly find the abnormal behaviors in video surveillance, timely given warning, and to minimize the false positives and false negatives; (2) how many unfavorable factors in (such as single sample, low resolution) and accurate analysis and identification of suspicious targets; (3) in the case of massive data, how to ensure the video analysis system's real-time and accuracy. In recent years, deep learning in many fields of machine vision, speech recognition and Natural Language Processing have achieved excellent Different grades, this is the development of intelligent video analysis technology has brought new opportunities. Therefore, this method based on deep learning of the related problems are studied, the main work and innovation are as follows: 1. for the study of abnormal behaviors in video surveillance especially violent fighting is difficult to accurately and quickly find the problem, put forward a violent detection method of 3D convolution based on network depth. The method using a large number of video data with labels for supervised learning, the traditional two-dimensional convolution kernel is extended to 3D to extract moving letter in the video information, and comprehensive utilization of spatial information and motion information to construct the depth of the neural network model, in order to achieve detection of violence in video surveillance fighting. Because of the characteristics of deep model of end-to-end learning, so do not need to describe the characteristics of manual design of complex motion information, In order to reduce the complexity of the task. The experimental results show that the proposed method can accurately identify the violent fighting behavior of.2. for face images are difficult to accurately identify the problem in the case of single training sample in a single scene and crowded environment, proposed a network based on kernel principal component analysis (Kerne1 Principle Component Analysis Networks, KPCANet) two phase of voting in face recognition model. The method without using additional sample data, using unsupervised deep KPCANet model block after the face images in training and learning by using KPCA filter in feature extraction, so as to ensure the robustness of the feature extraction of light illumination and occlusion at the same time, but also eliminates the influence of local deformation of face recognition rate. The voting fusion method of predicting each block value to get Finally, the identification results for a single vote not only, the voting method of the two stage, by expanding the candidate prediction of each block set, and in different regions give different weights to the results, so as to further enhance the recognition accuracy. The experimental results show that this method have achieved excellent performance in four public face datasets, the general algorithm accuracy is superior to the use of additional data sets, especially in the non restricted LFW-a face data sets, the method proposed in this paper than the SVDL and LGR methods to enhance the accuracy of about l5%.3. for face in video surveillance images because of low resolution can not accurately identify the problem, proposes a solution of low resolution face recognition model based on convolutional neural network. The proposed two models: input the multiscale convolution of God The network (Convolutional Neural Networks, CNN) model and the space of Pyramid (Spatial Pyramid pool based on Pooling, SPP) CNN model. (1) multiscale input CNN model is improved on the existing "two steps", to the sampling of low resolution images using a simple three biquadratic interpolation method, and then the sampled image and high resolution image mixed as samples, let CNN model learning feature space of high and low resolution image together, and then through the cosine distance to measure similarity, finally gives the recognition results show that in CMU and Extended. PIE Yale B data sets were better than other methods to compare accuracy of the model the highest recognition rate compared with the current CMDA_BGE algorithm, the accuracy rate has significantly increased from 2.5% to 9.9%. (2) SPP based on the CNN model, to improve the "space", through the CNN mode Join the layer space of Pyramid basin type, so that the model can output constant dimension feature vector for input images of different sizes, the similarity comparison sample library and test image can get the final recognition results. Experimental results show that compared with the multiscale input CNN model, the method maintains the high accuracy at the same time, saves the sampling operation, simplifies the process of image processing, but also reduce the traditional "cross space method" to study the function of the number of.4. furosemide injection according to the data monitoring system of transmission to bandwidth and huge amounts of data fast and accurate analysis of demand, put forward a based on "Haiyun synergy" deep learning model framework. The sea end system using deep learning to train the local data obtained by local model, local model of data For rapid detection, real-time response is given. The sea end system collaborative cloud training by uploading a local model and a small amount of data, to build more complex mode depth of these local models and data using cloud system. Type and tuning, get better performance of the global model. In MNIST, Cifar-10 and LFW show that the experiments on the data sets, "method of Haiyun synergy" can effectively reduce the data transmission bandwidth consumption, but also to ensure the accuracy and rapidity of cloud sea end. These methods have been applied to some CAS pilot "massive network data flow collaborative Haiyun real-time processing system (XDA060112030) project.

【学位授予单位】：中国科学技术大学
【学位级别】：博士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】