基于深度学习的人群密度估计及稠密人群计数的研究

发布时间：2018-06-22 19:16

本文选题：深度学习 + D-kNN　；参考：《郑州大学》2017年硕士论文

【摘要】：人群密度估计与稠密人群计数是当前计算机视觉领域的研究热点之一,具有非常广泛的应用。随着国家经济的持续高速发展,城镇化不断推进,城市人口规模越来越大,人群密集行为越来越多,由此带来的恐怖事件、踩踏事件也日趋增多,如上海外滩踩踏事件、甘肃固原踩踏事件等。目前通过监控视频实现人群密度估计和准确人群计数是一个至关重要的任务,其结果对人群检测、人群异常行为分析等有重要的参考作用。深度学习是一种由多个处理层组成的计算模型,它不需要人工标注各种特征,可以通过学习获得数据的多抽象层表示。近年来,深度学习方法的广泛应用显著提高了语音识别、视觉目标识别和检测结果。卷积神经网络是目前深度学习中最为流行的学习算法,其主要优势体现在局部连接和权值共享,不仅降低了网络模型的复杂度,减少了权值数量,而且这种网络结构对平抑、旋转、倾斜、比例缩放等具有高度不变形。本文基于深度学习方法对复杂场景中人群密度估计与稠密人群计数问题进行研究。稠密人群的特征为人群数量极大、场景透视、相邻个体间存在严重的遮挡与阻塞,为有效降低上述特征对人群密度估计和人群计数带来的影响,本文首先引入局部稠密概念,将图像分块,通过均匀化样本、添加距离阈值、增加欧氏距离权值改进kNN算法,并将D-kNN算法与灰度共生矩阵结合用于人群密度估计。均匀化样本和设置阈值避免了因目标场景与样本之间的距离过大造成的误判,保证了分类的性能;添加距离权值增强了特征的表示能力,降低了高维度特征对分类结果造成的影响。其次,本文借鉴卷积神经网络自动提取特征和对场景扭曲的不变性,提出了一种LR-CNN稠密人群计数模型。LR-CNN模型可以从分割和压缩过的图像中提取到原图像的有效信息;通过使用新的LR激活函数给卷积神经网络添加非线性因素,保留了部分负值,修正了数据分布,解决了ReLU训练时神经元易死亡的问题;使用人群密度估计得出的稠密块来训练LR-CNN稠密人群计数模型,降低了人群分布不均匀对人群计数问题带来的影响。为了验证和分析算法性能,本文采用当前较为流行的ShanghaiTech和UCF_CC_50数据集。使用均绝对误差(MAE)和均方误差(MSE)作为评估算法性能的标准,实验结果表明:本文设计的CNN计数模型在测试集上的MAE和MSE分别为:169.4,258.6;35.1,57.3;408.7,460.3;2.19,7.63;在稠密人群计数方面MAE和MSE较以往的方法有了明显的降低,提高了计数的准确率,对稠密人群计数因遮挡透视带来的问题提供了有效的解决方法。通过实验测试及与其他方法的对比,在高人群密度场景下较以往的方法降低了均绝对误差和均方误差,提高了稠密人群计数的准确率。
[Abstract]:Population density estimation and dense population counting are one of the hot topics in the field of computer vision, and they are widely used. With the sustained and rapid development of the national economy, urbanization continues to advance, the size of the urban population is getting larger and larger, and the crowd density is increasing. As a result, terrorist incidents and stampede incidents are also increasing, such as the Shanghai Bund stampede. The trampling incident in Guyuan, Gansu Province. At present, it is a very important task to estimate the population density and accurately count the population by surveillance video. The results are important reference for crowd detection and analysis of abnormal behavior of population. Depth learning is a computing model composed of multiple processing layers. It does not need to annotate all kinds of features manually, and it can be used to obtain multi-abstract layer representation of data. In recent years, the extensive application of depth learning methods has significantly improved the results of speech recognition, visual target recognition and detection. Convolution neural network is the most popular learning algorithm in depth learning at present. Its main advantages are local connection and weight sharing, which not only reduce the complexity of network model and reduce the number of weights, but also stabilize the network structure. Rotation, tilt, proportional scaling and so on has the height not to deform. In this paper, the problem of crowd density estimation and dense population counting in complex scenarios is studied based on the method of depth learning. In order to reduce the influence of the above characteristics on population density estimation and population count, the concept of local density is introduced in this paper. The image is divided into blocks and the range threshold is added to increase the Euclidean distance weight. The D-kNN algorithm is combined with the gray level co-occurrence matrix to estimate the population density. Homogenization of samples and setting of threshold value can avoid the misjudgment caused by the distance between the target scene and the sample, and ensure the performance of classification, and add the distance weight value to enhance the expression ability of the feature. The effects of high dimensional features on classification results were reduced. Secondly, this paper proposes a LR-CNN dense crowd counting model. LR-CNN model can extract the effective information of the original image from the segmented and compressed images by using the convolution neural network to extract the feature and the invariance of the scene distortion. By using the new LR activation function to add nonlinear factors to the convolutional neural network, some negative values are retained, the data distribution is corrected, and the problem of neuron death is solved during ReLU training. The LR-CNN dense population counting model is trained by using dense blocks estimated by population density estimation, which reduces the influence of uneven population distribution on population counting problem. In order to verify and analyze the performance of the algorithm, the current popular data sets of Shanghai Tech and UCF\ Mean absolute error (mae) and mean square error (MSE) are used as criteria for evaluating algorithm performance. The experimental results show that the mae and MSE of the CNN counting model designed in this paper are: 1 / 169.4258.6 / 35.1 / 35.1 / 57.3 / 408.7460.3/ 2.197.63 respectively. In dense population counting, mae and MSE are obviously lower than the previous methods, and the accuracy of counting is improved. It provides an effective solution to the problem caused by occlusion fluoroscopy in dense population counting. Through experiment and comparison with other methods, the absolute mean error and mean square error are reduced in high population density scenario, and the accuracy of dense population counting is improved.
【学位授予单位】：郑州大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】