基于深度学习的鲁棒性视觉跟踪方法
发布时间:2019-08-07 08:53
【摘要】:传统的视觉跟踪方法(如L1等)大多直接使用视频序列各帧内的像素级特征进行建模,而没有考虑到各图像块内部的深层视觉特征信息.在现实世界的固定摄像头视频监控场景中,通常可以找到一块区域,该区域中目标物体具有清晰、易于分辨的表观.因此,文中在各视频场景内事先选定一块可以清晰分辨目标表观的参考区域用以构造训练样本,并构建了一个两路对称且权值共享的深度卷积神经网络.该深度网络使得参考区域外目标的输出特征尽可能与参考区域内目标的输出特征相似,以获得参考区域内目标良好表征的特性.经过训练后的深度卷积神经网络模型具有增强目标可识别性的特点,可以应用在使用浅层特征的跟踪系统(如L1等)中以提高其鲁棒性.文中在L1跟踪系统的框架下使用训练好的深度网络提取目标候选的特征进行稀疏表示,从而获得了跟踪过程中应对遮挡、光照变化等问题的鲁棒性.文中在25个行人视频中与当前国际上流行的9种方法对比,结果显示文中提出的方法的平均重叠率比次优的方法高0.11,平均中心位置误差比次优的方法低1.0.
[Abstract]:Most of the traditional visual tracking methods (such as L _ 1, etc.) use pixel-level features in each frame of the video sequence directly to model, but do not take into account the deep visual feature information within each image block. In real-world fixed camera video surveillance scenes, an area can usually be found in which the target object has a clear and easy to distinguish appearance. Therefore, in this paper, a reference area which can clearly distinguish the apparent appearance of the target is selected in advance in each video scene to construct the training sample, and a deep convolution neural network with two symmetry and weight sharing is constructed. The depth network makes the output feature of the target outside the reference region as similar as possible to the output feature of the target in the reference region, in order to obtain the good representation of the target in the reference region. The trained deep convolution neural network model has the characteristic of enhancing the recognizability of the target, and can be applied to the tracking system using shallow features (such as L 1, etc.) to improve its robustness. In this paper, the trained depth network is used to extract the features of the target candidates for sparse representation in the framework of the L1 tracking system, and the robustness to deal with occlusion, lighting changes and other problems in the tracking process is obtained. Compared with 9 popular methods in 25 pedestrian videos, the results show that the average overlap rate of the proposed method is 0.11 higher than that of the suboptimal method, and the average center position error is 1.0 lower than that of the suboptimal method.
【作者单位】: 中国科学院自动化研究所模式识别国家重点实验室;
【基金】:国家“九七三”重点基础研究发展规划项目基金(2012CB316304) 国家自然科学基金重点项目(61432019)资助
【分类号】:TP391.41
[Abstract]:Most of the traditional visual tracking methods (such as L _ 1, etc.) use pixel-level features in each frame of the video sequence directly to model, but do not take into account the deep visual feature information within each image block. In real-world fixed camera video surveillance scenes, an area can usually be found in which the target object has a clear and easy to distinguish appearance. Therefore, in this paper, a reference area which can clearly distinguish the apparent appearance of the target is selected in advance in each video scene to construct the training sample, and a deep convolution neural network with two symmetry and weight sharing is constructed. The depth network makes the output feature of the target outside the reference region as similar as possible to the output feature of the target in the reference region, in order to obtain the good representation of the target in the reference region. The trained deep convolution neural network model has the characteristic of enhancing the recognizability of the target, and can be applied to the tracking system using shallow features (such as L 1, etc.) to improve its robustness. In this paper, the trained depth network is used to extract the features of the target candidates for sparse representation in the framework of the L1 tracking system, and the robustness to deal with occlusion, lighting changes and other problems in the tracking process is obtained. Compared with 9 popular methods in 25 pedestrian videos, the results show that the average overlap rate of the proposed method is 0.11 higher than that of the suboptimal method, and the average center position error is 1.0 lower than that of the suboptimal method.
【作者单位】: 中国科学院自动化研究所模式识别国家重点实验室;
【基金】:国家“九七三”重点基础研究发展规划项目基金(2012CB316304) 国家自然科学基金重点项目(61432019)资助
【分类号】:TP391.41
【参考文献】
相关期刊论文 前4条
1 黄凯奇;陈晓棠;康运锋;谭铁牛;;智能视频监控技术综述[J];计算机学报;2015年06期
2 肖国强;康勤;江健民;张贝贝;;基于中心宏块的视频目标跟踪算法[J];计算机学报;2011年09期
3 云廷进;郭永彩;高潮;;基于粒子Mean Shift迁移的红外人体目标跟踪算法[J];计算机学报;2009年06期
4 黄福珍,苏剑波;基于Level Set方法的人脸轮廓提取与跟踪[J];计算机学报;2003年04期
【共引文献】
相关期刊论文 前10条
1 任R,
本文编号:2523847
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2523847.html