深度学习驱动的场景分析和语义目标解析
本文关键词: 深度学习 卷积神经网络 深度估计 光流估计 行人细粒度分析 全变分模型 多尺度相关性学习 出处:《浙江大学》2017年硕士论文 论文类型:学位论文
【摘要】:语义目标解析和场景分析是计算机视觉中重要的研究方向,其主要目的是对图像和视频中的目标和场景进行分析、理解,在视频监控、自动驾驶、智能交通等方面均有广泛的应用。语义目标解析涉及对行人、车辆等目标的检测、识别及分析过程。其中行人细粒度分析是很多计算机视觉应用的基础,其目的是将行人图像分割成语义部件,并识别其属性。场景分析主要包括对场景的深度估计、运动分析以及结构分析等。场景的深度估计是指从图像中得到场景的深度信息,有助于恢复场景的三维结构。场景的运动分析则主要是指从连续视频帧中得到光流信息,被用于运动目标的行为识别和异常事件的检测分类。因此,有效的行人细粒度分析、图像深度估计和光流估计算法具有重要的现实意义,本文也主要关注这三个任务。近年来,深度学习已在目标检测、人脸识别、场景标注等计算机视觉任务上取得突破,设计以任务为导向的网络模型受到学术界和工业界越来越多的关注。本文将针对行人细粒度分析、单张图像深度估计和光流估计这三个任务,分别提出不同的基于深度学习的模型。具体如下:1.对于单张图像深度估计任务,本文首先回顾了已有的相关方法,然后针对目前基于深度学习的深度估计模型在建模空间上下文关系上存在的不足,本文分别提出基于数据驱动的上下文特征学习模型和基于全变分模型的损失函数模型。前者通过数据学习和像素位置相关的上下文关系权值将邻域特征融合到深度值预测,而后者则能够有效地压制噪声并在保留边缘的同时使结果更加的平滑。最后本文将这两种模型融合,得到更有效的方法。2.在光流估计任务中,相对于传统的光流估计方法,基于深度学习的方法具有效率高、易扩展的优点。然而目前基于深度学习的方法并不多,同时已有的深度模型在大位移光流预测问题上存在不足。本文将提出一种基于多尺度的相关性学习的深度卷积网络结构,能够有效地处理大位移情况。在一些大位移光流数据集上,相对于基准算法,本文提出的框架的表现有很明显的改善。另外,由于预测的结果含有较多的噪声和较大的误差,本文提出将递归神经网络与卷积神经网络相结合对预测的结果进一步修正并得到更加精细的结果。3.对于行人细粒度分析任务,本文针对监控视频下的行人精细化识别竞赛,提出两种基于Faster R-CNN的模型框架,一种是在同一个网络模型中联合学习部件检测和部件属性分类,另一种则是先基于Faster R-CNN框架检测出部件位置,然后再训练另一个网络对部件进行属性分类。实验表明先检测再分类的分阶段方式能够减少类之间的干扰进而减少误分类现象。
[Abstract]:Semantic object parsing and scene analysis are important research directions in computer vision. Their main purpose is to analyze the objects and scenes in images and videos, to understand, to monitor video, to drive automatically. Semantic target resolution involves the detection, identification and analysis of objects such as pedestrians and vehicles, in which fine-grained pedestrian analysis is the basis of many computer vision applications. Scene analysis includes depth estimation of scene, motion analysis and structure analysis. Depth estimation of scene refers to the depth information of scene. The motion analysis of the scene mainly refers to the optical flow information obtained from the continuous video frame, which is used to identify the behavior of moving targets and detect and classify abnormal events. Image depth estimation and optical flow estimation algorithms have important practical significance. This paper also focuses on these three tasks. In recent years, depth learning has made a breakthrough in computer vision tasks, such as target detection, face recognition, scene tagging and so on. The design of task-oriented network model has attracted more and more attention from academia and industry. This paper will focus on the three tasks of pedestrian fine-grained analysis, single image depth estimation and optical flow estimation. Different models based on depth learning are proposed respectively. The following are as follows: 1. For the task of estimating the depth of a single image, this paper first reviews the existing methods. Then aiming at the shortcomings of depth estimation model based on depth learning in modeling spatial context relationship, In this paper, a data-driven contextual feature learning model and a loss function model based on a total variation model are proposed, respectively, in which neighborhood features are fused to depth prediction through data learning and contextual weights related to pixel positions. The latter can effectively suppress noise and make the results smoother while preserving edges. Finally, the two models are fused to obtain a more effective method .2. compared with traditional optical flow estimation methods, Methods based on depth learning have the advantages of high efficiency and easy to be extended. However, there are few methods based on depth learning at present. At the same time, the existing depth models are deficient in the problem of large displacement optical flow prediction. In this paper, a kind of depth convolution network structure based on multi-scale correlation learning is proposed. In some large displacement optical flow data sets, the performance of the frame proposed in this paper is obviously improved compared with the reference algorithm. In addition, the prediction results contain more noise and larger errors. In this paper, the combination of recurrent neural network and convolutional neural network is proposed to further revise the prediction results and obtain more precise results .3. for the pedestrian fine grained analysis task, this paper aims at the pedestrian fine recognition competition under the surveillance video. Two model frameworks based on Faster R-CNN are proposed. One is to combine learning component detection and component attribute classification in the same network model, the other is to detect the location of components based on Faster R-CNN framework. Then another network is trained to classify the components. The experiment shows that the method of detecting and reclassifying can reduce the interference between classes and reduce the phenomenon of misclassification.
【学位授予单位】:浙江大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TP18
【相似文献】
相关期刊论文 前10条
1 邵义元;;一类模型学习样本的预处理[J];鄂州大学学报;2012年05期
2 谭建辉;;径向基函数神经网络的再学习算法及其应用[J];微电子学与计算机;2006年05期
3 薛志东;王燕;邱德红;;逆C均值学习样本筛选方法[J];微计算机信息;2007年27期
4 张映伟,于川,邢镇容;学习样本存在分类错误时的判据稳定性问题[J];计算机仿真;2003年06期
5 岑健;秦勇;邢镇容;;学习样本存在分类错误时的决策判据分析[J];茂名学院学报;2006年04期
6 黎移新;;多层前馈神经网络几种算法的样本顺序敏感性[J];食品与机械;2010年04期
7 胡瑞敏,李德仁,沈未名,,吴捷,姚天任;连续函数映射网络样本重组的研究[J];计算机学报;1996年09期
8 李远,刘悦,王媛,吴耿锋;地震预报专家系统中学习样本的构建[J];计算机工程与应用;2005年04期
9 蒋明 ,柏文阳 ,肖建华 ,符江东;调和的复合BP网络及学习算法[J];小型微型计算机系统;2003年03期
10 高隽;胡勇;胡良梅;;关于AM学习样本选择的实验研究[J];模式识别与人工智能;2002年03期
相关会议论文 前3条
1 田建艳;武增懿;韩肖清;;径向基函数神经网络学习算法的改进[A];2009年中国智能自动化会议论文集(第七分册)[南京理工大学学报(增刊)][C];2009年
2 周斌;;内燃机排放神经网络模型学习样本的确定[A];加入WTO和中国科技与可持续发展——挑战与机遇、责任和对策(上册)[C];2002年
3 文博武;胡寿松;;基于再励学习的歼击机安全着陆横侧向协调控制[A];2005全国自动化新技术学术交流会论文集(二)[C];2005年
相关硕士学位论文 前2条
1 赵杉杉;深度学习驱动的场景分析和语义目标解析[D];浙江大学;2017年
2 惠寅华;基于同伦的学习算法研究[D];苏州大学;2013年
本文编号:1494140
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1494140.html