物体分割与识别方法的研究和实现

发布时间：2018-04-09 05:14

本文选题：物体分割　切入点：物体识别　出处：《南京大学》2017年硕士论文

【摘要】：静态图片中物体的分割和识别是计算机视觉任务中两个非常重要的话题,二者之间紧密联系,彼此可以相互利用。然而对静态图片进行像素级的分割是一件很有难度的任务,这是因为现实中拍摄的照片往往受到光照和噪声的影响,除此以外,图片的背景有时也会很复杂,可能会和要分割的物体在颜色和纹理上相似。另一方面,对于静态图片的物体识别,目前很多方法都是基于滑动窗口+物体特征+分类器这种组合,该方法主要的缺点在于需要不断通过滑动窗口来扫描整张图片,然后对每一个窗口进行分类,所以会导致速度很慢。为了解决这些问题,我们先后设计了几种能够对静态图片中物体进行分割和识别的解决方案并进行了实验和对比分析。我们首先提出一个交互的分割方法,该方法基于Adaboost的分类思想,把超像素作为被处理的单元进行分类,而不再是单个像素,用户只需要提供少量的样本种子点即可。在该方法的基础上,我们结合了人体的姿势信息,从而可以对静态图片中的人体进行自动化分割,它可以对图片中人体的任意姿势进行分割,相较于一些需要通过人脸来定位人体位置再进行人体分割的方法,我们的方法不仅可以处理人体正面,也可以处理侧面和背面。此外,我们还利用卷积神经网络来进行物体分割,该框架由两个神经网络组成,其中定位网络负责定位图片中物体的位置,分割网络则负责对图片进行分割。对于以上得到的分割结果,往往会比较粗糙,我们使用了两种方法来进行优化,分别是高斯背景建模和bayse matting,其中我们对bayse matting算法进行了改进。对于物体识别任务,我们借鉴了基于区域的物体识别方法,即利用图割先对图片分割成多个区域,然后不断合并相似的区域,最后对得到的每一个区域进行分类。对于单个区域内的物体分类,我们分别使用HOG和CNN两种方式进行特征提取,再用SVM来进行训练,后者达到了更好的效果。但CNN提取特征的方式需要对每个区域进行一次卷积过程,所以很耗费时间,因此我们又参考了 FastRCNN的方法,在卷积层后面加入一个池化层,从而只需对整张图片进行一次卷积过程,该方法大大缩短了运行时间。最后我们在多个广泛使用的数据集上完成了物体分割和识别的多项实验,我们的方法在准确率和性能上都取得了不错的效果。
[Abstract]:The segmentation and recognition of objects in static images are two very important topics in the task of computer vision. They are closely related and can be used each other.However, it is a difficult task to segment static images at the pixel level, because the pictures taken in reality are often affected by illumination and noise. Besides, the background of the images is sometimes very complex.May be similar to the object to be segmented in color and texture.On the other hand, for the object recognition of static images, many methods are based on the combination of sliding window object feature classifier. The main disadvantage of this method is the need to scan the whole picture through sliding window.Each window is then sorted, so the speed is slow.In order to solve these problems, we have designed several solutions for segmentation and recognition of objects in static images, and carried out experiments and comparative analysis.We first propose an interactive segmentation method, which is based on the idea of Adaboost, and classifies super-pixels as processed units instead of single pixels. Users only need to provide a small number of sample seed points.On the basis of this method, we combine the posture information of human body, so we can automatically segment the human body in the static picture, and it can segment any pose of the human body in the picture.Compared with some methods which need to locate the position of human body by human face, our method can not only deal with the front of the human body, but also deal with the side and back.In addition, we use convolution neural network to segment objects. The framework consists of two neural networks, in which the location network is responsible for locating the position of the object in the picture, and the segmentation network is responsible for the image segmentation.For the above segmentation results, often rough, we use two methods to optimize, namely Gao Si background modeling and bayse matting, in which we improve the bayse matting algorithm.For the object recognition task, we draw lessons from the area-based object recognition method, that is, the image is divided into several regions by graph cutting, and then the similar regions are continuously merged, and finally each region is classified.For the classification of objects in a single region, we use HOG and CNN for feature extraction, and then use SVM for training, the latter achieves better results.But the way CNN extracts features requires a convolution process for each region, so it's time-consuming, so we refer to the FastRCNN method and add a pool layer after the convolution layer.Thus, only one convolution process is needed for the whole picture, and the running time is greatly reduced.Finally, we have completed many experiments of object segmentation and recognition on a number of widely used datasets, and our method has achieved good results in accuracy and performance.
【学位授予单位】：南京大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】