基于特征融合的视觉关注算法研究

发布时间：2018-05-27 08:33

本文选题：视觉关注 + 特征融合　；参考：《中国矿业大学(北京)》2017年博士论文

【摘要】：视觉关注是计算机视觉领域的重要研究内容之一,是指利用模式识别、机器学习等分析方法预测实验对象关注的感兴趣目标或者方向。基于特征融合的视觉关注算法是指通过特征提取和融合的方式构建头部特征矩阵,并计算头部姿态信息或者凝视方向信息,最终确定视觉关注的目标或者方向。近年来视觉关注算法在公共安全、自然会议和辅助驾驶等诸多领域得到广泛应用。虽然大量研究人员对基于人脸特征的视觉关注算法进行了大量研究,但是仍然存在许多问题,主要表现在三个方面:(1)局部特征与全局特征的表达不平衡问题。通常,特征融合方法选取整幅图片的各种特征进行加权融合,仅考虑了特征融合的全局有效性而造成局部特征表达不充分,或者是仅考虑局部特征,采用多种方法提取局部特征,产生全局特征表达过于复杂的问题。同一图像不同区域的显著特征各异,从全局提取特征多种特征容易造成全局特征计算复杂度高;而提取少量的特征会引起局部特征信息表达不充分。为了高效的提取到尽可能充分的局部特征,降低全局特征的计算复杂度,需要综合考虑局部特征与全局特征的平衡有效表达。(2)头部姿态表达复杂计算效率低下的问题。头部姿态是视觉关注技术的核心组成部分。准确的头部姿态估计可以高效的推动视觉关注目标预测和跟踪。头部姿态估计方法包括基于外观模型的、基于几何模型的和基于特征表达的三大类。基于特征表达的方法容易被外界环境干扰如头部配饰、头部位置变化;基于外观模型的方法需要训练大量头部数据,并且需要将姿态信息在训练样本中进行准确标注;基于几何模型的方法实时性高,受到相机标定参数、图像分辨率的严格限制,另外单个摄像头无法获得深度信息,即使准确率在达到了像素级,仍然存在5°左右的姿态角度误差。为了准确表达并高效计算头部姿态,需要构造高效简洁的头部姿态特征矩阵和头部姿态计算方法。(3)视觉关注中头部姿态有与凝视方向的歧义性问题。凝视方向与头部姿态是视觉关注算法研究的两个核心内容,二者相辅相成,缺一不可。单一的头部姿态或者凝视方向并不能准确表达人的视觉关注状态。在同一头部朝向范围内存在多个潜在的关注目标,需要结合凝视方向才能准确锁定视觉关注目标;此外,在头部朝向确定的条件下,存在凝视偏移,即视觉关注目标正在发生变化。目前对于视觉关注的研究往往集中于头部姿态分析或者凝视方向估计两个独立的方面,并没有达到缓解头部朝向与凝视方向歧义的目的。因此,综合考虑头部姿态与凝视方向之间的关系,能够缓解头部朝向和凝视方向歧义问题的视觉关注算法亟待提出。由于局部特征与全局特征的不平衡性、头部姿态表达的复杂性和计算的低时效性、头部朝向与凝视方向的歧义性,基于特征融合的视觉关注算法仍是艰难并富有挑战的研究课题。针对以上问题本文进行了以下三个方面的研究工作。(1)局部特征与全局特征的表达平衡性。在特征融合方面,为了提高局部特征表达充分性,降低全局特征表达复杂性,达到局部特征与全局特征的平衡性,本文构建了基于信息熵的局部特征提取框架,提出了用于头部姿态估计的加权熵融合的Gabor和Phase Congruency头部特征矩阵。首先,根据信息熵理论衡量图像局部特征的重要程度,确定何种特征能充分的表达该区域的原始信息;然后将所有的局部特征以简洁方式联结构成全局特征矩阵;最后,通过公开的人脸数据集、头部数据集使用机器学习分类器和回归器进行验证,说明文中提出的加权信息熵融合的头部姿态特征矩阵结合相应的监督学习方法在头部姿态的分类性能优于常用全局特征融合矩阵。(2)头部姿态的准确表达和高效计算。在头部姿态表达和计算面,为了提高头部姿表达准确性,提升头部姿态计算的时效性,本文提出了基于深度信息重建的头部姿态估计算法以及改进的加权版本。首先提取头部的LBP(Local binary pattern,LBP)特征构建Adaboost-LBP人脸分类器;然后根据相机成像原理重建深度信息,根据深度信息及目标与相机之间的几何关系利用基于深度信息重建的头部姿态估计算法计算头部姿态。为了提高构建深度信息的精确度,使用ASM(Active shape model,ASM)方法提取提取68点人脸轮廓模型,构建加权深度信息重建算法;最后使用优化后的深度信息结合头部特征及外观模型对视觉关注场景中的头部姿态进行实验,说明本文提出的基于深度信息重建的头部姿态估计算法和其改进的加权版本在头部姿态表达准确性和计算性能两方面优于常用的头部姿态估计方法。(3)头部姿态与凝视方向的歧义性。视觉关注领域包括头部姿态与凝视方向两方面的研究工作。单一的头部姿态可以对应多个凝视方向,同一个凝视方向也可以处于不同的头部姿态条件下。因此,用头部姿态或者凝视方向来描述视觉关注会产生歧义。为了缓解视觉关注领域头部姿态与凝视方向的歧义性,本文提出了凝视辅助的HMM(Hidden Markov Model,HMM)结合的视觉关注算法。首先,通过深度卷积神经网络学习获得头部数据并计算头部姿态和凝视方向;然后,通过HMM将凝视方向与头部姿态结合预测视觉关注方向或者目标;最后,使用公开头部姿态数据集和实时视频数据进行实验分析,说明本文提出的凝视辅助的视觉关注算法方法在一定程度上削弱了视觉关注歧义性,能够提高视觉关注目标预测准确率。通过公共数据集和视频数据的同构异构数据验证,得出了以下结论:(1)采用加权信息熵特征融合框架对Gabor特征和Phase Congruency特征进行融合,构建的头部姿态特征矩阵,既充分表达了头部局部特征,也降低了全局特征的复杂性,达到了局部与全局的平衡,提高了头部姿态估计算法分类精度与时效。(2)提出的基于深度信息重建的头部姿态估计算法和其改进后的加权版本,准确地重建了深度信息,提高了头部姿态表达的准确性和姿态估计时效性。(3)提出的凝视辅助的视觉关注算法,通过HMM将凝视方向与头部姿态结合预测视觉关注方向或者目标,缓解了视觉关注算法中头部朝向与凝视方向的歧义性,降低了视觉关注的误差。
[Abstract]:Visual attention is one of the important research contents in the field of computer vision. It refers to the use of pattern recognition, machine learning and other analytical methods to predict the interest target or direction of the experimental object. The visual attention algorithm based on feature fusion refers to the construction of the head feature matrix by feature extraction and fusion, and the calculation of the head posture. In recent years, visual attention algorithms have been widely used in many fields, such as public security, natural meeting and auxiliary driving. Although a large number of researchers have done a lot of research on visual attention algorithms based on face features, there are still many problems. It is mainly manifested in three aspects: (1) the problem of unbalanced expression of local and global features. Usually, the feature fusion method selects the various features of the whole picture to carry on the weighted fusion, only considering the global validity of the feature fusion, resulting in inadequate expression of the local features, or only considering the local features, and using a variety of methods to extract the bureaus. The features of the same image are too complex to express the characteristics of the same image. The distinct features of the different regions of the same image are different. It is easy to extract the features from the global feature to cause the high complexity of the global feature, and the extraction of a small number of features will cause insufficient local feature information to be expressed. In order to reduce the computational complexity of global features, it is necessary to consider the balanced and effective representation of local and global features. (2) the problem of low computational complexity in the expression of head attitude. Head pose is the core component of visual attention technology. The accurate head attitude estimation can efficiently promote the vision and tracking of visual attention. The head attitude estimation method includes three categories based on the appearance model, the geometric model and the feature based expression. The method based on the feature expression is easily disturbed by the external environment, such as the head accessories and the head position. The method based on the appearance model needs to train a large number of head data, and the attitude information needs to be in the training sample. Based on the geometric model, the method has high real-time performance, the camera calibration parameters and the image resolution are strictly limited. In addition, a single camera can not obtain depth information. Even if the accuracy rate reaches the pixel level, there is still a attitude angle error of about 5 degrees. In order to accurately express and efficiently calculate the head posture, it needs to be constructed. The high efficient and concise head attitude feature matrix and head attitude calculation method. (3) the head posture has the ambiguity problem with the gaze direction in the visual attention. The gaze direction and the head pose are the two core contents of the visual attention algorithm research. The two are complementary and indispensable. The single head posture or the gaze direction is not accurate. There are a number of potential attention targets in the same direction of the same head. It is necessary to lock the visual attention target with the direction of the gaze. In addition, there is a gaze shift under the condition of the head orientation, that is, the visual attention is changing. Research on visual attention is often focused on the focus of visual attention. Two independent aspects of head attitude analysis or gaze direction estimation do not achieve the purpose of alleviating the ambiguity of head orientation and gaze direction. Therefore, considering the relationship between head attitude and gaze direction, the visual attention algorithm which can alleviate the ambiguity problem of head orientation and gaze direction needs to be put forward urgently. The unbalance of global features, the complexity of the expression of the head attitude, the low timeliness of the computing, the ambiguity of the direction of the head and the direction of the gaze, the visual attention algorithm based on the feature fusion is still a difficult and challenging research topic. In this paper, the following three aspects are studied. (1) local features and global characteristics In the aspect of feature fusion, in order to improve the expression of local features, reduce the complexity of global feature expression and achieve the balance of local features and global features, this paper constructs a local feature extraction framework based on information entropy, and proposes a weighted entropy fusion Gabor and Phase Congruenc for head attitude estimation. Y head feature matrix. First, according to the information entropy theory to measure the importance of the local feature of the image, and determine what features can fully express the original information of the region, and then combine all the local features in a concise way to form a global feature matrix; finally, the header data set uses a machine learning score through an open face data set. The classification performance of head attitude is better than the common global feature fusion matrix. (2) the accurate expression of the head posture and the high efficiency calculation. In the head attitude expression and the computing surface, the head attitude expression and computing face are used to improve the head. In this paper, the head attitude estimation algorithm based on the depth information reconstruction and the improved weighted version are proposed. Firstly, the LBP (Local binary pattern, LBP) feature of the head is extracted and the Adaboost-LBP face classifier is constructed. Then the depth information is reconstructed according to the camera imaging principle, and the depth of the face is reconstructed according to the camera imaging principle. Degree information and the geometric relationship between the target and the camera use the head attitude estimation algorithm based on the depth information reconstruction to calculate the head posture. In order to improve the accuracy of the construction depth information, ASM (Active shape model, ASM) method is used to extract the 68 point face contour model and construct the weighted depth information reconstruction algorithm. Finally, the optimization is used to optimize the algorithm. After the depth information combined with the head feature and appearance model, the head posture in the visual attention scene is experimentation, and the proposed head attitude estimation algorithm based on the depth information reconstruction and its improved weighted version are superior to the commonly used head attitude estimation methods in the two sides of the head attitude expression and computing performance. (3) The ambiguity of the posture of the head and the direction of the gaze. The field of visual attention includes two aspects of the head attitude and the direction of the gaze. A single head posture can correspond to multiple gaze directions, and the same gaze direction can also be in a different head posture. Therefore, the visual attention will be described with the head attitude or the direction of the gaze. In order to alleviate the ambiguity of head posture and gaze direction in the field of visual attention, this paper presents the visual attention algorithm of HMM (Hidden Markov Model, HMM) combined with gaze assistant. First, the head data is obtained by the deep convolution neural network and the head attitude and the direction of gaze is calculated. Then, the direction of the gaze is calculated by HMM. Combined with the head posture, the visual attention direction or target is predicted. Finally, the open head attitude data set and real-time video data are used to carry out experimental analysis. It shows that the gaze assisted visual attention algorithm proposed in this paper weakens the ambiguity of visual attention to a certain extent, and can improve the accuracy of the visual attention target prediction. The following conclusions are obtained from the isomorphic heterogeneous data validation of public data sets and video data. (1) a weighted information entropy feature fusion framework is used to fuse the features of Gabor and Phase Congruency, and the head attitude feature matrix is constructed, which not only fully expresses the local feature of the head, but also reduces the complexity of the global feature, and reaches the local level. With the global balance, the classification accuracy and time limitation of the head attitude estimation algorithm are improved. (2) the proposed head attitude estimation algorithm based on the depth information reconstruction and its improved weighted version can accurately reconstruct the depth information, improve the accuracy of the posture expression of the head and the timeliness of the attitude estimation. (3) the visual correlation of the gaze assistance proposed. The method of injection is used to predict the direction of visual attention or target by combining the direction of the gaze with the head posture by HMM, which alleviates the ambiguity of the direction of the head and the gaze in the visual attention algorithm, and reduces the error of visual attention.
【学位授予单位】：中国矿业大学(北京)
【学位级别】：博士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】