基于区域卷积神经网络的行人检测问题研究

发布时间：2018-04-04 02:38

本文选题：行人检测　切入点：卷积神经网络　出处：《杭州电子科技大学》2017年硕士论文

【摘要】：行人检测一直是机器视觉领域的研究热点和难点,其在智能监控、智能交通和智能机器人等人工智能领域应用越来越广泛,比如在交通安全领域,利用行人检测技术可以预判前方及附近是否有行人,若发现则立即采取紧急制动,这样能够有效避免车辆碰撞行人,减少人员伤亡。行人检测不同于普通目标检测,行人属于非刚性目标,在现实生活中,行人穿着各式各样、人体姿态千变万化、所处背景复杂多变、光照不足以及行人之间相互遮挡等情形给这项工作带来巨大的挑战。前人提出了许多有效的行人检测算法,其中最有代表性的是梯度直方图(Histogram of Oriented Gradient,HOG)特征,但其在更为复杂的背景环境下检测效果仍然不是很理想。近年来,深度学习重新进入人们的视角,其中深度卷积神经网络在模式识别方面更是取得了重大的突破,说明了其在特征提取方面的优越性。本文在充分研究行人检测技术以及深度学习尤其是深度卷积神经网络模型的基础上取得如下成果:(1)设计了基于区域卷积神经网络的行人检测系统。针对传统人工设计的特征提取复杂度高且难以有效表达复杂场景中的行人特征的问题,本文采用深度卷积神经网络模型来进行行人检测,该模型通过组合低层特征形成更加抽象的高层表示属性类别或特征,进而从样本中提取鲁棒性更强、更能刻画图像的特征向量。由于网络模型层次较深,需要训练参数较多,而人工标注行人的数据样本较少,为了防止训练过程中的过拟合现象发生,本文采用微调的方法训练网络。最后,通过多组实验的验证,与基于HOG特征的方法想比,该算法能够明显提升行人检测的准确率。(2)针对行人检测系统中采用选择性搜索算法(Selective Search,SEL)获取预选区域效率低下的问题,本文采用Edge Boxes算法优化了行人检测系统。预选窗口的获取对于行人检测系统至关重要,利用选择性搜索算法提取一张图像的预选区域需要花费2秒左右,这严重影响了整个行人检测系统的检测效率。当本文采用Edge Boxes算法提取预选区域时,虽然检测准确率没有明显的提升,但只需要耗费0.3秒的时间来提取一张图片的窗口,大大改善了系统的检测效率。(3)设计了基于快速区域卷积神经网络的行人检测框架。针对采用深度卷积神经网络进行特征提取难以保证实时性的问题,本文在网络模型中引入了感兴趣区域汇聚层(RoI Pooling Layer),通过该层模型只需要对原图像提取一次卷积特征,并将预选区域映射到特征图(Feature Map)中后,得到固定维度的特征向量。实验表明,使用该方法在保证一定检测准确率的情况能够极大的提升检测速度,改善了算法的实时性和适用性。
[Abstract]:Pedestrian detection has always been a hot and difficult point in the field of machine vision. It has been widely used in intelligent monitoring, intelligent transportation and intelligent robot fields, such as traffic safety.Pedestrian detection technology can be used to pre-judge whether there are pedestrians in the front and nearby. If found, emergency braking can be taken immediately, which can effectively avoid vehicle collision with pedestrians and reduce casualties.Pedestrian detection is different from ordinary target detection. Pedestrians belong to non-rigid targets. In real life, pedestrians wear a variety of clothes, human posture varies, and the background is complex and changeable.Lack of light and mutual occlusion between pedestrians pose a great challenge to the work.Many effective pedestrian detection algorithms have been proposed, among which the most representative one is the gradient histogram of Oriented gradient histogram, but the detection effect is still not satisfactory in the more complex background.In recent years, deep learning has re-entered the perspective of people, among which the deep convolution neural network has made a great breakthrough in pattern recognition, which shows its superiority in feature extraction.In this paper, the pedestrian detection system based on regional convolution neural network is designed based on the research of pedestrian detection technology and depth learning, especially the deep convolution neural network model.Aiming at the high complexity of feature extraction in traditional artificial design and the difficulty of effectively expressing pedestrian features in complex scenes, this paper uses a deep convolution neural network model to detect pedestrians.The model combines lower level features to form more abstract high-level representation attribute classes or features, and then extracts more robust feature vectors from the samples.In order to prevent the over-fitting in training process, the network model is trained by fine-tuning method because of the deep level of the network model and the need for more training parameters, while the number of data samples labeled by manual pedestrian is less.Finally, through the verification of many experiments, compared with the method based on HOG feature, the algorithm can obviously improve the accuracy of pedestrian detection.In this paper, the Edge Boxes algorithm is used to optimize the pedestrian detection system.The acquisition of pre-selected window is very important for pedestrian detection system. It takes about 2 seconds to extract a pre-selected area of an image by selective search algorithm which seriously affects the detection efficiency of the whole pedestrian detection system.When the Edge Boxes algorithm is used to extract the pre-selected region, although the detection accuracy is not significantly improved, it only takes 0.3 seconds to extract a window of a picture.The detection efficiency of the system is greatly improved. A pedestrian detection framework based on fast area convolution neural network is designed.Aiming at the problem that it is difficult to guarantee the real-time performance of feature extraction by using deep convolution neural network, this paper introduces ROI Pooling layer into the network model, through which only one convolution feature is extracted from the original image.The feature vector of the fixed dimension is obtained by mapping the preselected region to the feature map.Experiments show that this method can greatly improve the detection speed and improve the real-time and applicability of the algorithm in the case of certain detection accuracy.
【学位授予单位】：杭州电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41;TP183

【参考文献】