时空特征提取方法研究

发布时间：2018-09-05 11:14

【摘要】：视频内容的认识是计算机视觉中的重要问题,相关研究可以用于智能视频监控、人机交互、视频检索等多个领域。视频的特征表达对于视频内容的识别至关重要。由于视频数据量大且内容复杂,同时会受到视角、背景、时间等因素影响,很难提取出良好的视频特征。近年来相关研究取得了一定进展,但仍存在诸多困难,无法良好的应用到实际场景中。传统方法以使用人工设计的局部特征表达为主,对视频中时空信息的描述能力不足。同时,传统视频内容识别中特征提取方法依赖复杂的处理运算,其速度难以达到实时性能。本文针对视频内容识别中的时空特征提取,从识别准确率和识别速度两方面都进行了研究,本文主要工作如下。1.慢特征分析(SFA:slow feature analysis)从快速变化的信号中提取缓慢变化的特征,这一方法已被证实可以模拟灵长类动物的初级视皮层(V1)的复杂细胞。初级视皮层为腹侧和背侧通路提供信息,分别用于外观和运动信息的处理。然而,SFA在局部特征提取中只被用于提取缓慢变化的信息,这些信息主要表征静态的外观信息,不包含运动信息。为了更好的利用时序信息,本文将SFA扩展为时间方差分析(TVA:temporal variance analysis)。TVA学习一个线性映射函数,将原始的时序信息映射为在时序上具有不同变化量的特征分量。受到V1区域启发,我们通过TVA学习局部感受野(local receptive field),并通过卷积和池化操作进行局部特征提取。本文对基于TVA的特征提取方法在四个行为识别数据库上做了测试,实验结果表明,基于TVA方法提取的慢特征与快特征都能有效的进行特征表达,且能够获得比传统基于梯度方向直方图特征更好的结果。2.动态纹理以不同形态广泛存在,如火焰、烟雾、车流等,由于动态纹理视频在时序上复杂的变化使得动态纹理识别成为一个具有挑战性的问题。本文提出一种基于慢特征分析的动态纹理识别方法。慢特征分析可以从复杂的动态纹理中学到具有不变性的特征。然而,复杂的时间变化要求高层级的语义信息来进行特征表达以达成时间不变性,这难以通过慢特征分析方法直接从高维视频中学习到。我们提出了基于流形约束的慢特征分析(MR-SFA:manifold regularized SFA)学习一个低语义级别的局部特征,以描述复杂的动态纹理。MR-SFA约束具有相似初始状态的特征在时间上也具有相似的变化,此方法可以学到一个具有部分可预测性的慢变化特征,以应对动态纹理的复杂性。本文在动态纹理识别和动态场景识别数据库上进行了实验,实验结果验证了MR-SFA的有效性。3.传统的视频特征提取方法对于实时性或大规模应用而言时间效率太低。通过将特征提取中所需的光流信息替换为视频压缩域中的运动向量(MV:motion vector),时间效率可以在一定程度上得到缓解。此外,压缩域中的其他信息也可以被用于特征提取。在传统压缩视频中,DCT(discrete cosine transform)系数编码了视频中连续帧之间的残差信息,这部分信息是运动向量所指向的块(block)无法捕捉的信息。我们提出了一组名为残差边缘直方图的特征,利用DCT系数的不同部分进行视频的特征提取。另一方面,在深度图视频中,我们利用了深度图视频的压缩域信息,包括DWT(discrete wavelet transform)系数和间隔点(breakpoints)信息。DWT系数描述了深度图中的深度信息,而间隔点保证了深度图视频具有锐利清晰的边缘,本文利用这两种压缩域信息提取了一系列用于深度图视频的特征。本文在行为识别数据库上对上述特征提取方法进行了验证,实验结果表明,相比传统方法本文方法在保证良好识别准确率的基础上具有明显的速度优势。综上,一方面,本文基于对视频时空信息的分析,提出了新的时空局部特征提取方法,以获得更优的识别准确率;另一方面,本文从压缩域信息出发,直接从被压缩的视频信息中进行时空特征的提取,在保证良好识别准确率的情况下大幅提高了识别速度。
[Abstract]:Video content recognition is an important issue in computer vision, and related research can be used in intelligent video surveillance, human-computer interaction, video retrieval and other fields. Video feature expression is very important for video content recognition. It is difficult to extract good video features. In recent years, related research has made some progress, but there are still many difficulties, which can not be well applied to actual scenes. In this paper, the space-time feature extraction in video content recognition is studied from the aspects of recognition accuracy and recognition speed. The main work of this paper is as follows. 1. Slow feature analysis (SFA) extracts slowly changing signals from fast changing signals. The primary visual cortex provides information for ventral and dorsal pathways, respectively, for processing appearance and motion information. However, SFA is only used to extract slowly varying information in local feature extraction, which mainly represents static information. In order to make better use of temporal information, this paper extends SFA to time variance analysis (TVA). TVA learns a linear mapping function, which maps the original temporal information to the characteristic components with different temporal variations. Local receptive field is used to extract local features by convolution and pooling. In this paper, the method of feature extraction based on TVA is tested on four behavioral recognition databases. The experimental results show that both slow and fast features extracted by TVA can be effectively expressed and can be transmitted by comparison. Dynamic texture exists widely in different shapes, such as flame, smoke, traffic flow and so on. Because of the complex changes of dynamic texture video sequence, dynamic texture recognition becomes a challenging problem. This paper presents a dynamic texture recognition method based on slow feature analysis. Slow feature analysis can learn invariant features from complex dynamic textures. However, complex temporal variations require high-level semantic information to express features to achieve time invariance, which is difficult to be learned directly from high-dimensional video by slow feature analysis. We propose a manifold-based slow feature. MR-SFA: manifold regularized SFA (MR-SFA) learns a low semantic level local feature to describe a complex dynamic texture. MR-SFA constraints with similar initial state features also have similar changes in time. This method can learn a partially predictable slow change feature to cope with the complexity of dynamic texture. Experiments on dynamic texture recognition and scene recognition databases demonstrate the effectiveness of MR-SFA. 3. Traditional video feature extraction methods are too time-efficient for real-time or large-scale applications. In traditional compressed video, DCT (discrete cosine transform) coefficients encode the residual information between consecutive frames in the video, which is not available in the block directed by the motion vector. We propose a set of features called residual edge histograms to extract the features of the video using different parts of the DCT coefficients. On the other hand, we use the compression domain information of the depth map video, including DWT (discrete wavelet transform) coefficients and breakpoints. In this paper, a series of features for depth map video are extracted by using the two kinds of compressed domain information. The above feature extraction methods are validated in the behavior recognition database. The experimental results show that the proposed method is better than the traditional method. In summary, on the one hand, based on the analysis of video spatiotemporal information, this paper proposes a new spatiotemporal local feature extraction method to achieve better recognition accuracy; on the other hand, this paper starts from compressed domain information and directly from compressed video information. Spatio-temporal feature extraction in the message greatly improves the recognition speed while ensuring good recognition accuracy.
【学位授予单位】：华南理工大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP391.41

【相似文献】