分类指导回归的手势估计

发布时间：2018-05-30 21:23

本文选题：手势估计 + 关键点　；参考：《中国科学技术大学》2017年硕士论文

【摘要】：随着智能家居和智能设备的普及发展,在人们的日常生活中人与设备之间的信息交互将会变得越来越频繁。特别随着计算机和人工智能的发展,更加符合人类交流方式的无接触式人机交互技术研究领域将会变得日趋活跃。这些研究领域包括了眼球跟踪技术、语音识别技术、人脸面部表情识别技术、唇语识别技术、人脸识别技术、手势识别技术和身体姿势识别技术等等。由于手势信息量丰富并且交互运动具有自然舒适无约束的特点,所以手势交互技术是未来人机交互领域的重要研究方向。由于人手体积较小,移动速度和方向变化快,手指的自由度非常高,各个手指之间具有极强的外观相似性并且非常容易相互遮挡,因此如何基于视觉快速精确地估计出人手三维关键点是一个非常具有挑战性的研究课题。针对复杂高维度的手势空间和大视角高遮挡的情形,本文基于"分而治之"的思想提出了一种分类指导回归的手势三维关键点估计方法。该方法将一个困难复杂的手势回归任务划分成多个相对更容易的子任务,对每一个子任务学习一个其专属的回归模型,从而避免了仅靠单一模型无法很好地处理所有情况的问题。首先离线训练一个以深度图作为输入的深度卷积神经网络分类器GoogLeNet。不同于之前手势分类器是按照相机视角不同来划分类别,本文的分类器是按照刚性对齐的手势不同来划分类别。对于手势分类器所能预测的所有类别,分别离线训练一个对应于类别的级联随机森林回归器。在测试阶段,输入深度图到手势分类器直接预测出一个手势类别,然后再次把深度图送入预测类别对应的级联随机森林回归器,最终输出相机坐标系下的人手关键点三维坐标。密集丰富的实验验证了本文提出的分类指导回归算法的高效性和有效性。从定性角度来看,本文分类指导回归算法大幅度地领先于全体样本整体回归算法。与其他的优秀算法相比较来看,本文分类指导回归算法依然能够在大多数最大允许误差阈值区间内领先于其他优秀算法。从定性角度来看,本文方法不但能够处理好复杂的大角度高遮挡的手势情形,同时还能保持很高的帧率,完全能够满足实时精确的应用场景。
[Abstract]:With the development of smart home and intelligent devices, the information interaction between people and devices will become more and more frequent in people's daily life. Especially with the development of computer and artificial intelligence, the research field of contactless human-computer interaction, which is more suitable for human communication, will become more and more active. These research fields include eyeball tracking technology, speech recognition technology, facial expression recognition technology, lip recognition technology, face recognition technology, gesture recognition technology and body posture recognition technology. Because gesture information is abundant and interactive motion is natural comfortable and unconstrained gesture interaction technology is an important research direction in the field of human-computer interaction in the future. Because of the small size of the hands, the speed and direction of movement, the degree of freedom of the fingers is very high, the appearance of each finger is very similar and it is very easy to block each other. Therefore, how to estimate the human hand 3D key points quickly and accurately based on vision is a very challenging research topic. In view of the complex high dimensional gesture space and the large angle of view and high occlusion, based on the idea of "divide and conquer", this paper proposes a method to estimate the three dimensional key points of gesture guided by classification and regression. In this method, a difficult and complex gesture regression task is divided into several relatively easy sub-tasks, and each subtask is taught a unique regression model. Thus, the problem that a single model can not handle all cases well is avoided. First, a depth convolution neural network classifier Google LeNet, which uses depth map as input, is trained offline. Different from the previous gesture classifier which is classified according to the camera angle of view, the classifier in this paper is classified according to the rigid alignment of different gestures. For all categories predicted by gesture classifier, a cascaded stochastic forest regression is trained offline. In the test stage, the depth map is input into the gesture classifier to predict a gesture category directly, and then the depth map is sent into the cascade random forest regression corresponding to the prediction category again, and the 3D coordinate of the key points of the hand in camera coordinate system is output. Intensive experiments demonstrate the effectiveness and efficiency of the classification guidance regression algorithm proposed in this paper. From the qualitative point of view, the classification guidance regression algorithm is significantly ahead of the whole sample regression algorithm. Compared with other excellent algorithms, this classification guidance regression algorithm is still ahead of other excellent algorithms in most of the maximum allowable error threshold range. From the qualitative point of view, this method can not only deal with the complex large angle and high occlusion gesture situation, but also maintain a high frame rate, which can meet the real-time and accurate application scenarios.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】