基于深度学习的单目图像深度估计
发布时间:2019-06-08 12:08
【摘要】:3D场景解析是计算机视觉领域一个重要的研究课题,而深度估计是理解场景的3D几何关系的重要方法。在许多计算机视觉任务中,与只使用RGB图像的情况相比,额外地融入相对准确可靠的深度信息能够较大地提升算法的性能,例如语义分割,姿态估计及目标检测。传统的单目图像深度估计方法都基于光学几何约束或一些环境假设,例如运动中恢复结构,焦点或者光照变化等。然而,在缺少以上约束或假设的情况下,研究出一个能够仅根据一幅单目图像的信息精确地估计深度的计算机视觉系统,是一项极具挑战的任务。该任务有以下两大难点:其一是一般的计算机视觉系统很难像人类的大脑一样从单目图像中抓取到充足的可用以推测3D结构的信息;其二是该任务本身是一个病态问题,即一张二维图像对应无穷多种真实的3D场景。这种将单幅图像映射到深度图的固有的不确定性决定了视觉模型不可能仅凭单幅图像估计出精确的深度值。针对这两个难题,本文分别提出了以下方法:首先,本文提出了一个将卷积神经网络与条件随机场统一于一个深度学习框架内的计算机视觉模型。卷积神经网络能够提取丰富的相关特征,条件随机场则可根据像素的位置与颜色信息对卷积网络输出进行优化;其次,针对这一问题的病态性,本文提出了一个融合稀疏已知标签的视觉模型,该模型以已获得的一些相对精确的深度值为参考,较大地减少了其他像素点上合理深度值的搜索范围,从而使模型在一定的程度上减少了RGB图像到深度图之间映射的不确定性。总而言之,本文提供了从单目图像估计深度的最新研究进展,包括相关的数据库,研究方法及其性能。对单目图像深度估计存在的问题以及未来的发展方向做出了分析与讨论。同时,提出了一种从单目图像中学习深度信息特征表达的计算机视觉模型。考虑到该问题的病态性,又提出了一种融合稀疏已知标签的视觉模型,减少了单目图像与深度图之间的映射的不确定性。并且,在NYU Depth v2数据集上验证了以上两个视觉模型的有效性与优越性。
[Abstract]:3D scene analysis is an important research topic in the field of computer vision, and depth estimation is an important method to understand the 3D geometric relationship of scene. In many computer vision tasks, the extra integration of relatively accurate and reliable depth information can greatly improve the performance of the algorithm, such as semantic segmentation, attitude estimation and target detection, compared with the use of RGB images only. Traditional monocular image depth estimation methods are based on optical geometric constraints or some environmental assumptions, such as restoration structure in motion, focus or light change, and so on. However, in the absence of the above constraints or assumptions, it is a challenging task to develop a computer vision system which can accurately estimate the depth according to the information of only one monocular image. The task has the following two difficulties: one is that the general computer vision system is difficult to capture enough information from monocular images like the human brain to infer 3D structure; The other is that the task itself is a morbid problem, that is, a two-dimensional image corresponds to infinitely many real 3D scenes. The inherent uncertainty of mapping a single image to a depth map determines that the visual model can not estimate the exact depth value only from a single image. In order to solve these two problems, the following methods are proposed in this paper: firstly, a computer vision model which unifies convolution neural network and conditional random field into a deep learning framework is proposed. Convolution neural network can extract rich related features, and conditional random field can optimize the output of convolution network according to the position and color information of pixels. Secondly, in view of the pathological nature of this problem, a visual model combining sparse known tags is proposed in this paper, which is based on some relatively accurate depth values obtained. The search range of reasonable depth value on other pixels is greatly reduced, so that the model reduces the uncertainty of mapping between RGB image and depth map to a certain extent. In a word, this paper provides the latest research progress of depth estimation from monocular images, including related databases, research methods and performance. The problems existing in depth estimation of monocular images and the development direction in the future are analyzed and discussed. At the same time, a computer vision model for learning depth information feature representation from monocular images is proposed. Considering the pathological nature of the problem, a visual model combining sparse known tags is proposed, which reduces the uncertainty of mapping between monocular images and depth maps. Moreover, the effectiveness and superiority of the above two visual models are verified on the NYU Depth v2 dataset.
【学位授予单位】:哈尔滨理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TP18
本文编号:2495273
[Abstract]:3D scene analysis is an important research topic in the field of computer vision, and depth estimation is an important method to understand the 3D geometric relationship of scene. In many computer vision tasks, the extra integration of relatively accurate and reliable depth information can greatly improve the performance of the algorithm, such as semantic segmentation, attitude estimation and target detection, compared with the use of RGB images only. Traditional monocular image depth estimation methods are based on optical geometric constraints or some environmental assumptions, such as restoration structure in motion, focus or light change, and so on. However, in the absence of the above constraints or assumptions, it is a challenging task to develop a computer vision system which can accurately estimate the depth according to the information of only one monocular image. The task has the following two difficulties: one is that the general computer vision system is difficult to capture enough information from monocular images like the human brain to infer 3D structure; The other is that the task itself is a morbid problem, that is, a two-dimensional image corresponds to infinitely many real 3D scenes. The inherent uncertainty of mapping a single image to a depth map determines that the visual model can not estimate the exact depth value only from a single image. In order to solve these two problems, the following methods are proposed in this paper: firstly, a computer vision model which unifies convolution neural network and conditional random field into a deep learning framework is proposed. Convolution neural network can extract rich related features, and conditional random field can optimize the output of convolution network according to the position and color information of pixels. Secondly, in view of the pathological nature of this problem, a visual model combining sparse known tags is proposed in this paper, which is based on some relatively accurate depth values obtained. The search range of reasonable depth value on other pixels is greatly reduced, so that the model reduces the uncertainty of mapping between RGB image and depth map to a certain extent. In a word, this paper provides the latest research progress of depth estimation from monocular images, including related databases, research methods and performance. The problems existing in depth estimation of monocular images and the development direction in the future are analyzed and discussed. At the same time, a computer vision model for learning depth information feature representation from monocular images is proposed. Considering the pathological nature of the problem, a visual model combining sparse known tags is proposed, which reduces the uncertainty of mapping between monocular images and depth maps. Moreover, the effectiveness and superiority of the above two visual models are verified on the NYU Depth v2 dataset.
【学位授予单位】:哈尔滨理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TP18
【参考文献】
相关期刊论文 前4条
1 冯春;吴小锋;尹飞鸿;杨名利;;基于局部特征匹配的双焦单目立体视觉深度估计[J];计算机技术与发展;2016年10期
2 许路;赵海涛;孙韶媛;;基于深层卷积神经网络的单目红外图像深度估计[J];光学学报;2016年07期
3 明英;蒋晶珏;明星;;基于柯西分布的单幅图像深度估计[J];武汉大学学报(信息科学版);2016年06期
4 江静;张雪松;;基于计算机视觉的深度估计方法[J];光电技术应用;2011年01期
,本文编号:2495273
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2495273.html