基于卷积神经网络的行人检测方法研究

发布时间：2018-07-09 12:58

本文选题：卷积神经网络 + 行人检测　；参考：《哈尔滨工业大学》2017年硕士论文

【摘要】：交通环境中的行人检测问题对于算法的检测精度和速度都有较高要求。传统方法能够满足速度要求,但在精度上差距较远,基于卷积神经网络的方法精度较高,但是计算量巨大。本文主要在基于卷积神经网络的基础上构建精度更高速度更快的行人检测方法。首先,本文分别从网络结构设计,损失函数设计,正则化方法和优化策略四个方面总结了卷积网络应用相关的基础知识;然后以MNIST数据集上的手写数字识别为例,重点比较了梯度下降优化算法不同变体之间的差别,为优化算法的选择提供了实践依据。其次,在目标检测领域内领先的Faster RCNN方法基础上,基于总结的设计准则和行人的尺度特性,调整了网络的锚点窗口设置和区域生成网络方式,添加了环境区域池化层。然后基于开源深度学习框架对该网络在加州理工行人数据集上进行了实现,实验结果表明该方法可以实现高效的行人检测。然后,针对单尺度区域生成网络由于输入特征图固定造成的无法兼顾大小行人检测精度的问题,设计了基于特征图分层的多尺度区域生成网络,并为之设计了相应的随机缩放裁剪数据扩增方法应对训练不均衡问题。实验结果表明,该网络能够在输入分辨率较低的情况下实现比单尺度方法在高分辨率输入情况下更高的检测精度。最后,针对测试过程中检测速度慢的问题,设计了基于奇异值分解和Tucker-2分解的全连接层和卷积层的压缩方法,分别将高维的全连接层和卷积层近似为级联的低维全连接层和卷积层。结果表明,通过“训练-分解-调优”的三段压缩方式,该方法能够在不明显损失检测精度的情况下实现单层4倍、总体1.6倍的加速和总体4倍的模型大小压缩。压缩后的网络在GTX1080显卡加速下能够达到30帧每秒。
[Abstract]:Pedestrian detection in traffic environment requires high detection accuracy and speed. The traditional method can meet the requirement of speed, but the accuracy is far behind. The method based on convolution neural network has high accuracy, but the computation is huge. In this paper, a more accurate and faster pedestrian detection method is constructed based on convolution neural network. Firstly, this paper summarizes the basic knowledge of convolution network application from four aspects of network structure design, loss function design, regularization method and optimization strategy, and then takes handwritten digit recognition on MNIST dataset as an example. The differences between different variants of gradient descent optimization algorithm are compared, which provides a practical basis for the selection of optimization algorithm. Secondly, on the basis of the leading Faster RCNN method in the field of target detection, based on the summary design criteria and pedestrian scale characteristics, the network anchor window setting and area generation network mode are adjusted, and the environmental area pool layer is added. Then the network is implemented on the California Institute of Technology pedestrian data set based on open source deep learning framework. The experimental results show that the proposed method can achieve efficient pedestrian detection. Then, aiming at the problem that the single scale region generation network can not take into account the size of pedestrian detection accuracy due to the fixed input feature map, a multi-scale region generation network based on feature graph layer is designed. And designed the corresponding random scaling clipping data amplification method to deal with the problem of uneven training. The experimental results show that the network can achieve higher detection accuracy than the single-scale method in the case of high resolution input with low input resolution. Finally, aiming at the problem of slow detection speed in the testing process, a compression method of full connection layer and convolution layer based on singular value decomposition and Tucker-2 decomposition is designed. The high dimensional full connection layer and convolution layer are approximated as cascade low dimensional full connection layer and convolution layer respectively. The results show that by using the three-stage compression method of "training-decomposition-tuning", the method can achieve four times of single layer, 1.6 times of total acceleration and 4 times of model size compression without obvious loss of detection accuracy. The compressed network can reach 30 frames per second with the acceleration of GTX 1080 graphics card.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41;TP183

【参考文献】