基于多模态视觉数据融合的目标跟踪方法研究

发布时间：2018-06-12 22:32

本文选题：目标跟踪 + 多模态　；参考：《安徽大学》2017年硕士论文

【摘要】：视觉目标跟踪,旨在连续的视频帧或在线视频中,计算出选定目标在每一帧中的位置信息,是计算机视觉研究领域中的一个基础且重要的研究课题,其在诸如目标制导、自动驾驶、行为识别等应用场景下有着广泛的应用价值。可见光单模态目标跟踪作为视觉目标跟踪的首要研究问题,近年来,获得了丰富的研究成果。人们不仅提出了多种基于不同理论框架的目标跟踪算法,这些算法在时间和精度两个方面提升目标跟踪算法的性能;还建立了包含多种复杂条件的可见光目标跟踪数据集,用于评估这些不同目标跟踪算法的性能。这些工作不仅奠定了可见光单模态目标跟踪算法的理论基础,而且在实际的工程项目中也有着广泛的应用。虽然目前的可见光单模态目标跟踪算法,在很多复杂的跟踪场景下仍然有着良好的跟踪性能,但是在某些极端条件下,如低照度或零照度条件,现有的基于可见光的单模态目标跟踪算法仍会失效。针对这一问题,研究人员通过引入热红外图像或颜色深度图像信息,来弥补可见光单模态视频数据的不足。由于可见光视频和热红外视频良好的互补特性,近年来,基于热红外视频和可见光视频的多模态目标跟踪算法的研究,得到了广泛的关注。本文针对基于热红外和可见光视频的多模态目标跟踪算法进行了相关的研究,其主要贡献有:(1)提出了一种基于模态可靠性相关度的多模态目标跟踪算法。由于热红外和可见光不同的成像机制,不同成像机制下获取的目标信息具有不同的权重,为了评估不同模态的权重,使得传统的单模态算法能够始终在较好的模态下进行目标跟踪,本文提出了一种模态可靠性定义准则,并在此准则基础上,设计实现了一种实时的多模态目标跟踪算法,该算法能够自适应的利用热红外和可见光信息实现目标的持续稳健跟踪,在跟踪过程中,通过相关的模型更新算法,使得跟踪模型能够适应目标外观变化,降低噪声的影响。(2)提出了一种融合局部和全局信息的多模态协同目标跟踪算法。在多模态目标跟踪过程中,不同的视频模态有着不同的权重,进一步,跟踪样本的不同区域对于跟踪结果而言也有着不同的贡献。考虑到不同模态的权重以及跟踪样本不同分块区域的权重,本文提出了一种融合多模态数据的协同目标跟踪算法。该模型通过联合的稀疏表示学习来充分使用跟踪目标样本和样本内部图像块之间的内在联系。同时,模型在处理样本的内部图像块时,保持了其空间结构布局信息;并且考虑到跟踪目标样本和其局部图像块对跟踪结果的不同贡献值,进行了联合的加权处理;最后,考虑了多模态不同模态的权重,并且将该权重和整个目标跟踪稀疏外观表示模型联合求解。(3)构建了一个包含多种复杂条件的多模态目标跟踪数据集。由于当前公开的多模态数据集,如OSU、AIC等,其场景单一、视频序列较少,难以作为评估多模态目标跟踪来使用。为了能够建立一个统一的多模态跟踪目标跟踪数据集,以评估各种多模态目标跟踪算法,本文构建了一个包括低照度、背景杂乱等复杂条件的多模态视频数据集,这些视频包含了低照度条件下的单人行进,两人交叉遮挡,单个刚体自行车行进等多种挑战性因素。原始视频数据,经过初步整理、场景对齐、跟踪目标位置人工标注之后,形成了一个较为完备的多模态目标跟踪评测数据集。
[Abstract]:Visual target tracking, aiming at continuous video frames or online video, calculates location information of selected targets in each frame. It is a fundamental and important research topic in the field of computer vision research. It has extensive application value in the scene such as target guidance, autopilot, behavior recognition and so on. Visible light single mode. As the primary research problem of visual target tracking, a lot of research results have been obtained in recent years. People not only put forward a variety of target tracking algorithms based on different theoretical frameworks, which improve the ability of the target tracking algorithm in two aspects of time and precision, and also set up visible light containing a variety of complex conditions. The target tracking data set is used to evaluate the performance of these different target tracking algorithms. These work not only establish the theoretical basis for the visible light single mode target tracking algorithm, but also have extensive applications in the actual project. Although the current visible light single mode target tracking algorithm is still in many complex tracking scenes. It has good tracking performance, but in some extreme conditions, such as low illumination or zero illumination, the existing single mode target tracking algorithm based on visible light will still fail. In recent years, the research of multi-modal target tracking algorithm based on thermal infrared video and visible video is widely paid attention to the good complementary characteristics of visible and video video and thermal infrared video. In this paper, the research on multi-modal target tracking algorithm based on thermal infrared and visible video is studied. The main contributions are as follows: (1) A multi-modal target tracking algorithm based on the correlation degree of modal reliability is presented. Due to the different imaging mechanisms of thermal infrared and visible light, the target information obtained under different imaging mechanisms has different weights. In order to evaluate the weight of different modes, the traditional single modal algorithm can always carry out the target under the better mode. In this paper, a definition criterion of modal reliability is proposed. On the basis of this criterion, a real-time multi-modal target tracking algorithm is designed and implemented. The algorithm can adaptively use the information of thermal infrared and visible light to realize the continuous and robust tracking of the target. In the process of tracking, the tracking mode is made through the related model updating algorithm, and the tracking mode is made. (2) a multi-modal cooperative target tracking algorithm which combines local and global information is proposed. In the process of multi-modal target tracking, different video modes have different weights. Further, the different regions of the tracking sample also have different tribute to the tracking results. Considering the weight of different modes and the weight of different block regions of the tracking sample, this paper proposes a cooperative target tracking algorithm that combines multimodal data. The model uses a joint sparse representation learning to make full use of the inner link between the tracking target sample and the image block inside the sample. At the same time, the model is processed in the sample. In the internal image block, the spatial structure layout information is maintained, and a joint weighting process is taken into consideration of the tracking target samples and its local image blocks for the different contribution values of the tracking results. Finally, the weights of the multimodal and different modes are considered, and the weight and the whole target tracking sparse appearance representation model are combined. (3) (3) a multimodal target tracking data set containing a variety of complex conditions is constructed. Because of the current public multi-modal data sets, such as OSU, AIC, and so on, the scene is single and the video sequence is less, it is difficult to use the multi-mode target tracking to evaluate the multi-modal target tracking data set, in order to establish a unified multi-modal tracking target tracking data set, to evaluate the data set for evaluation. In this paper, a multimodal video data set with complex conditions such as low illumination and background chaos is constructed. These videos include a variety of challenging factors, such as single person travel under low illumination conditions, two people cross occlusion, and single rigid bicycle travel. The original video data, after preliminary sorting, scene pair After tracking the target location manually, a more complete multimodal target tracking evaluation dataset is formed.
【学位授予单位】：安徽大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】