基于多层卷积特征高阶融合的多任务目标检测系统研究

发布时间：2018-07-27 13:25

【摘要】：随着深度学习技术在计算机视觉领域取得的广泛成功,当前基于卷积神经网络(CNN)的目标检测技术发展迅速。作为计算机视觉领域的研究热点之一,目标检测在视频监控、工业机器人自动化抓取等方向中拥有广泛的应用前景。在自动化抓取应用中,实际场景通常包含大量的小目标物体,同时机器人抓取需要具备准确的目标姿态估计。现有基于CNN的目标检测算法通常针对大目标物体设计,对小目标的定位能力不足,同时无法估计目标的姿态变化。针对上述问题,本文从机器人抓取的实际应用出发,分别引入二阶的多层深度特征融合结构提升小目标的检测性能,以及在CNN架构中加入角度预测层,通过多任务学习的方法同时提升目标定位和姿态估计的准确性。针对小目标检测问题,本文首先基于Hyper-Column特征融合算法,将图像分类中常用的二阶特征引入到目标检测框架中,并实现了两种包含位置信息的二阶特征融合方案:一种是基于特征非线性变换的二阶响应变换模型(SORT),一种是基于特征核方法的二阶核融合的模型(HIHCA)。VOC数据集上的实验结果表明,两种基于Hyper-Column的二阶特征融合方法均可以有效提升系统的性能;为了进一步利用底层CNN特征更好的定位能力,我们将最新的特征分层表征TDM模型和二阶信息相结合,提出了基于二阶TDM特征融合的目标检测模型。在VOC数据集上的结果验证了本文提出模型的有效性。针对目标的姿态估计问题,我们通过引入角度预测层,将刚性物体的姿态估计子任务加入到目标检测中,通过多任务学习的方式实现端对端的模型训练。在PASCAL 3D数据集上的实验结果表明,本文提出的多任务学习策略可以有效提升姿态估计和目标检测的性能。最后,本文将提出的两种模型组成一个完整的目标检测系统,并通过机器人目标抓取应用验证算法在实际场景中的性能。
[Abstract]:With the success of deep learning technology in the field of computer vision, the target detection technology based on convolutional neural network (CNN) is developing rapidly. As one of the research hotspots in the field of computer vision, target detection has a wide application prospect in video surveillance, automatic capture of industrial robots and so on. In the application of automatic capture, the actual scene usually contains a large number of small target objects, and robot capture needs accurate target attitude estimation. The existing target detection algorithms based on CNN are usually designed for large target objects, but the localization ability of small targets is insufficient, and the attitude change of the target can not be estimated at the same time. Aiming at the above problems, this paper introduces a second-order multi-layer depth feature fusion structure to improve the detection performance of small targets, and adds an angle prediction layer to the CNN architecture, starting from the practical application of robot capture. The accuracy of target location and attitude estimation is improved by multitask learning. Aiming at the problem of small target detection, based on the Hyper-Column feature fusion algorithm, the second order features commonly used in image classification are introduced into the target detection framework. Two second-order feature fusion schemes are implemented: one is the second-order response transformation model based on the feature nonlinear transformation (SORT),) and the other is the second-order kernel fusion model (HIHCA). VOC) based on the feature kernel method. Two second-order feature fusion methods based on Hyper-Column can effectively improve the performance of the system. In order to further utilize the lower CNN features and better localization ability, we combine the latest feature stratified representation TDM model with second-order information. A target detection model based on second order TDM feature fusion is proposed. The results on the VOC dataset verify the validity of the proposed model. In order to solve the problem of target attitude estimation, the attitude estimation subtask of rigid object is added to the target detection by introducing the angle prediction layer, and the end-to-end model training is realized by multi-task learning. Experimental results on PASCAL 3D dataset show that the proposed multi-task learning strategy can effectively improve the performance of attitude estimation and target detection. Finally, the two models proposed in this paper constitute a complete target detection system, and verify the performance of the algorithm in the actual scene by robot target capture.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】