基于分布式深度学习系统的视频动作识别

发布时间：2018-01-15 11:35

本文关键词：基于分布式深度学习系统的视频动作识别　出处：《中国科学技术大学》2017年硕士论文　论文类型：学位论文

【摘要】：视频动作识别是指视频中人物动作的识别,它是解决视频监控、人机交互、视频情感分析等问题的基础,也是深度学习研究的热门领域。随着视频动作识别需求的日益增长,如何构建一个高效快捷的大规模视频动作识别框架、设计一个具有更好效果的视频动作识别模型在现阶段显得尤为重要,而且这两种情况至今都没有统一的解决方案。针对以上的两种问题,本文进行了深入的研究,实现了基于远程GPU调用的分布式深度学习系统,提出了改进的视频动作识别模型。最终利用分布式深度学习系统训练改进的视频动作识别模型分别验证两者的可行性和有效性。本文的主要研究内容如下:1.利用API重定向的虚拟化方法构建远程GPU调用,并在此基础上实现深度神经网络的加速训练。在ZeroMQ构建的分布式系统上实现多个GPU远程调用,形成基于多个GPU远程调用的分布式深度学习系统,并在深度学习库cuDNN、P2P、网络通信等方面实现改进。这种方法构建的分布式深度学习系统可以用来做大规模视频动作识别框架,其最大的好处为单机多GPU的代码不需要修改或者只需要少量修改就可以实现快速的分布式拓展。2.改进传统的视频动作方法,设计不需要手动设计特征的新模型iRCN。该模型利用全局的采样特征,将视频分成不同的时间阶段进行图片采样。利用3D_CNN提取视频每个阶段的运动特征。然后利用biLSTM获得所有运动特征在时间域上的相关性,最后把Softmax作为损失函数。最终iRCN模型在数据集UCF-101和HDMB-51上分别获得85.6%和56.6%的正确率。这是目前为止,在所有不借助手动设计特征的视频动作识别方法中最高的正确率。3.在分布式深度学习系统上可以实现数据并行和模型并行的快速分布式拓展。利用分布式深度学习系统对改进的视频动作识别模型进行有效的训练,训练内容主要包括两个方面,分别为基于改进MapReduce的视频特征提取和完整模型的分布式训练。分布式深度学习系统可以有效的提高视频特征的提取速度,实验同时可以验证分布式深度学习系统和改进视频动作识R%模型的可行性和有效性。最后利用模型并行的分布式拓展说明分布式深度学习系统的优缺点。
[Abstract]:Video action recognition is the recognition of characters in video, which is the basis of video monitoring, human-computer interaction, video emotional analysis and so on. With the increasing demand of video motion recognition, how to construct an efficient and fast large-scale video motion recognition framework. The design of a better video action recognition model is particularly important at this stage, and there is no unified solution for these two situations. In this paper, a distributed deep learning system based on remote GPU call is implemented. An improved video action recognition model is proposed. Finally, the improved video action recognition model is trained by distributed depth learning system to verify the feasibility and effectiveness of the two models. The main contents of this paper are as follows:. 1.Building remote GPU calls using the virtualization method of API redirection. On the basis of this, the accelerated training of deep neural network is realized, and multiple GPU remote calls are realized on the distributed system built by ZeroMQ. A distributed deep learning system based on multiple GPU remote calls is formed and a P2P system is built in the deep learning library cuDNNs. The distributed depth learning system constructed by this method can be used to make large-scale video motion recognition framework. The biggest benefit is that the code of single machine with multiple GPU need not be modified or only need a small amount of modification to achieve a fast distributed extension. 2. Improve the traditional video action method. A new model, iRCNs, that does not require manual design features is designed. The model utilizes global sampling features. The video is divided into different time stages for image sampling. 3D biLSTM is used to extract the motion features of each stage of the video. Then biLSTM is used to obtain the correlation of all the motion features in the time domain. Finally, Softmax is taken as the loss function. Finally, the accuracy of the iRCN model on the dataset UCF-101 and HDMB-51 is 85.6% and 56.6% respectively. Stop. In all the video action recognition methods without manual design features, the highest correct rate. 3. In the distributed depth learning system, data parallelism and model parallelism can be implemented in a fast distributed extension. The improved video motion recognition model is trained effectively by the degree learning system. The training content mainly includes two aspects. Video feature extraction based on improved MapReduce and distributed training of complete model. Distributed depth learning system can effectively improve the speed of video feature extraction. The experiments can also verify the feasibility and effectiveness of the distributed depth learning system and improve the video motion recognition R% model. Finally, the advantages and disadvantages of the distributed depth learning system are illustrated by parallel distributed extension of the model.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【参考文献】