基于上下文的目标检测算法研究

发布时间：2018-05-21 13:12

本文选题：目标检测 + 上下文　；参考：《南京大学》2017年硕士论文

【摘要】：近年来,随着网络普及以及视屏网站和社交网络的兴起,人们能够接触到大量的图像和视屏等多媒体资源。正因此,计算机视觉得到快速的发展,而其中目标检测也受到越来越多的关注。目标检测作为一个分类问题,对计算机视觉和机器学习的研究发展也起到推波助澜的作用。目标检测的应用十分普遍,如人脸检测、行人检测、车辆检测和图像分类。为了达到检测目的,目标检测通常分为两个子任务[1]:目标分类和目标区域定位。目标分类是判断图像中是否存在着被检测的类别对象,若存在的话根据分类概率得出该对象所属分类。而目标区域定位是找出被检测的对象的位置,通常会是一个矩形框。传统的目标检测算法一般分为三个步骤,第一步使用滑动窗口选定一个区域,第二步对这个区域抽取特征,最后对区域特征进行分类得到结果。比如人脸检测,首先在图像上选择滑动窗口,抽取LBP(Local Binary Pattern)或者HOG(Histogram of Oriented Gradient)等特征,然后采用SVM或AdaBoost分类器进行分类处理,判断当前窗口是否为人脸。从2006年开始逐步蔓延开的深度学习方法对计算机视觉领域产生了重大影响,应用了深度学习的目标检测得到了跨越式的发展。采用Region Proposal的深度学习目标检测方法只需选取较少的窗口即可达到较高的召回率,基于回归方法的深度学习目标检测方法更是大大加快了检测的速度。在进行目标检测时,目标常常会存在形变、被遮挡和视角变化等问题,这导致检测结果不佳。众多研究表明,合理利用图像中的局部上下文、全局上下文和目标上下文,能够减轻这些问题的影响,从而提高检测的准确率。为了解决这些问题,本文在传统方法和深度学习方法上分别提出了一种基于上下文的目标检测算法,主要研究内容如下:1.基于LBP,设计了一种新的特征直方图统计方法,加入了局部上下文信息。主要改动有两点,一是扩展了直方图统计区域,一是对于不同的位置,在统计时会给予不同的权重。2.基于YOLOv2,设计了一种上下文目标检测深度学习方法,加入了目标上下文信息。首先,在训练数据集上计算得到类间相关性。然后,使用YOLOv2的卷积网络得到边界框(bounding boxes)和所有类的分类概率。选择边界框中置信度最高的框所属的分类作为参考类,根据类间相关性改变所有类的分类概率。最后,分类概率最高的类作为指定类,计算窗口内含有指定类的目标的概率,筛选掉低于阈值的窗口。最后,对于以上两种方法,本文分别在ORL人脸数据集和PASCAL目标检测数据集上进行了实验,实验结果表明本文提出的方法能够获得更高的检测准确性。
[Abstract]:In recent years, with the popularity of the network and the rise of video websites and social networks, people can access a large number of multimedia resources such as images and video. As a result, computer vision is developing rapidly, and target detection is attracting more and more attention. As a classification problem, target detection also contributes to the research and development of computer vision and machine learning. Target detection is widely used, such as face detection, pedestrian detection, vehicle detection and image classification. In order to achieve the purpose of detection, target detection is usually divided into two sub-tasks [1]: target classification and target region location. Target classification is to judge whether there is a class object to be detected in the image and, if it exists, to get the classification of the object according to the probability of classification. The location of the target area is to find out the location of the object being detected, usually a rectangular box. The traditional target detection algorithm is generally divided into three steps. In the first step, a region is selected using a sliding window; the second step is used to extract the features of the region; finally, the result is obtained by classifying the region features. For example, in face detection, a sliding window is first selected on the image to extract features such as LBP(Local Binary pattern or HOG(Histogram of Oriented Gradient), and then SVM or AdaBoost classifier is used to classify the current window to determine whether the current window is a face or not. The depth learning method, which has spread gradually since 2006, has a great influence on the field of computer vision, and the target detection of the application of deep learning has been developed by leaps and bounds. The depth learning target detection method based on Region Proposal can achieve a higher recall rate by selecting only a few windows, and the depth learning target detection method based on regression method greatly speeds up the detection speed. In target detection, there are always some problems, such as deformation, occlusion and change of angle of view, which lead to poor detection results. Many studies show that reasonable use of the local context, global context and target context in the image can reduce the impact of these problems and improve the accuracy of detection. In order to solve these problems, this paper proposes a context-based object detection algorithm based on traditional methods and depth learning methods. The main research contents are as follows: 1. Based on LBP, a new feature histogram statistic method is designed, and local context information is added. There are two main changes, one is to expand the histogram statistical area, the other is to give different weights. 2. Based on YOLOv2, a depth learning method for contextual object detection is designed, and target context information is added. First, the correlation between classes is calculated on the training data set. Then, YOLOv2 convolution network is used to obtain the boundary bounding boxes) and the classification probability of all classes. The classification which belongs to the box with the highest confidence in the boundary box is selected as the reference class, and the classification probability of all classes is changed according to the correlation between the classes. Finally, the class with the highest classification probability is used as the specified class. The probability of the target with the specified class in the window is calculated, and the window below the threshold value is filtered out. Finally, for the above two methods, the experiments are carried out on the ORL face data set and the PASCAL target detection data set, respectively. The experimental results show that the proposed method can achieve higher detection accuracy.
【学位授予单位】：南京大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】