复杂场景下视频目标自动分割算法研究

发布时间：2018-05-03 18:10

本文选题：视频目标分割 + 光流　；参考：《中国科学技术大学》2017年硕士论文

【摘要】：随着互联网基础设施的逐步升级以及移动终端的快速普及,人们可以越来越方便地拍摄和观看视频。视频由于其本身所携带信息的丰富性和生动性,成为了人们生活中重要的信息传播载体之一。不断增长的海量视频数据也带来了如何识别、检索和理解视频内容的需求。如何降低视频内容理解难度,提炼出视频中的关键信息成为当前视频处理领域的重要研究课题。由于视频目标分割的研究目标是有效分割出具有显著性特征的前景目标,所以它在视频摘要、视频检索、动作分析和视频语义理解等领域拥有广泛的应用。当前的视频目标分割算法大多属于自底向上的方法,通过获取并分析视频中颜色和边缘特征、运动信息等底层特征分割出具有显著性特点的前景目标。传统基于人工标注的算法已经不能满足当前大规模视频数据环境下的应用需求。同时,海量视频中包含的场景和拍摄条件是复杂而多样的,使得当前的自动化视频目标分割算法并不能在一些复杂场景中仍保持较好的鲁棒性。针对上述问题,本文提出了两种适用于不同场景的视频目标自动分割算法。主要研究工作和创新点如下:1.现有基于图割的算法容易受到背景噪声和像素点失配的干扰,在一些复杂场景下鲁棒性不佳。本文提出了一种基于光流场和图割的视频目标自动分割算法,针对上述问题做了改进。在对前景目标分割前,该算法预先对视频全局动作特征进行分析,获得了前景目标的先验知识,减少了背景噪声对算法的干扰。针对像素点失配问题,该算法提出了动态位置模型优化机制,利用前景目标的位置模型增强了分割结果的时域连续性。实验表明,该算法在镜头快速移动、前景目标运动特征不规律等场景下能够获得更加准确和鲁棒的分割结果。2.在一些复杂场景下,现有基到候选目标的算法往往会出现分割结果部分缺失的问题,这一问题的根源在于候选目标过于碎片化以及候选目标间的时域映射关系不够准确。本文提出了一种基于候选目标的改进算法。该算法对原生候选目标进行了时域扩展与合并,不仅改善了候选目标碎片化的问题,还提高了相邻帧间候选目标的时域连续性。为了进一步增强模型时域映射关系的准确性,该算法引入了更多图像特征用于度量模型的边权值。在多个基准数据集上的实验表明,相较于现有同类算法,该算法对背景噪声的抗噪能力更强,在背景环境复杂、水面倒影等场景中分割结果更加完整。
[Abstract]:With the gradual upgrading of Internet infrastructure and the rapid popularity of mobile terminals, people can more and more easily shoot and watch video. Because of the richness and vividness of the information it carries, video has become one of the important carriers of information dissemination in people's life. The growing mass of video data also brings the demand of how to identify, retrieve and understand the video content. How to reduce the difficulty of video content understanding and extract the key information of video has become an important research topic in the field of video processing. Because the research goal of video target segmentation is to segment the foreground target with significant features, it has a wide range of applications in video summarization, video retrieval, action analysis and video semantic understanding. Most of the current video target segmentation algorithms belong to bottom-up methods. By obtaining and analyzing the bottom features such as color edge feature and motion information the foreground target with significant characteristics is segmented. The traditional algorithm based on manual annotation can not meet the needs of the current large-scale video data environment. At the same time, the scene and shooting conditions included in the massive video are complex and diverse, which makes the current automated video target segmentation algorithm can not maintain good robustness in some complex scenes. In order to solve the above problems, this paper proposes two automatic video object segmentation algorithms for different scenes. The main research work and innovation are as follows: 1. The existing algorithms based on graph cutting are vulnerable to background noise and pixel mismatch, and are not robust in some complex scenarios. In this paper, an automatic video object segmentation algorithm based on optical flow field and graph cutting is proposed and improved. Before segmenting the foreground target, the algorithm analyzes the global motion features of the video in advance, obtains the prior knowledge of the foreground target, and reduces the interference of the background noise to the algorithm. To solve the problem of pixel mismatch, the algorithm proposes a dynamic position model optimization mechanism, which enhances the continuity of segmentation results in time domain by using the position model of foreground target. Experimental results show that the proposed algorithm can obtain more accurate and robust segmentation results. In some complex scenarios, the existing algorithms based to candidate targets often have the problem of partial absence of segmentation results. The root of the problem lies in the fragmentation of candidate targets and the inaccuracy of time domain mapping relationship between candidate targets. This paper presents an improved algorithm based on candidate targets. The algorithm extends and combines the original candidate targets in time domain, which not only improves the fragmentation of candidate targets, but also improves the continuity of candidate targets between adjacent frames in time domain. In order to further enhance the accuracy of the temporal mapping of the model, the algorithm introduces more image features to measure the boundary weights of the model. Experiments on several datum datasets show that the proposed algorithm is more robust to background noise than the existing algorithms, and the segmentation results are more complete in the background environment, water surface reflection and other scenes.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】