基于卷积神经网络的人体行为识别研究

发布时间：2018-10-16 12:42

【摘要】：近年来,高清视频设备的推出使得基于行为识别技术的人工智能在智慧安全城市、智能家居和军事安防等领域得以飞速发展。广泛的应用前景和经济价值让行为分析与识别这一技术迅速成为计算机视觉领域的研究热点。传统的行为识别算法通常分为运动前景检测、特征提取以及训练识别三个步骤。虽然该方法的识别率尚可接受,但是其鲁棒性不高,且工作量巨大。此外,实际场景中目标之间多有遮挡、背景复杂多样以及拍摄角度不固定等因素都造成传统方法识别困难甚至失效。本文旨在利用卷积神经网络(Convolutional Neural Networks,CNN)改善传统行为识别方法中存在的这些问题,在提高算法鲁棒性的同时尽量提高识别的准确率。针对背景减差法和帧间差分法在运动幅度不太大的情况下无法提取完整前景的缺点,本文提出基于高斯差分(Difference of Gaussian,DoG)图像的人体剪影提取算法。该方法利用两张相邻高斯尺度空间的图像相减构造包含人体轮廓信息的差分图像,然后对其进行二值强化、形态学处理等操作得到粗略的人体剪影图像;第二步使用阈值对每行的粗略人体剪影区域进行扫描检测,再经闭运算等操作后得到完整准确的人体剪影图像。为融合图像序列的时域信息,本文累加周期内的人体剪影图像,生成二维特征图,并将其送入到CNN中进行训练识别。最终,经过网络调参和五折交叉验证等实验后在KTH公共数据集上得到85.3%的平均准确率,证明该识别框架具有一定的可行性。为了更好地处理视频数据,学者们将卷积神经网络扩展到了三维。本文利用3D CNN进行实验,发现特征组合"光流图-帧差图-三帧帧差图"可以取得最佳识别效果。经过网络调参和五折交叉验证等实验后在KTH公共数据集上得到92.0%的平均准确率。其次,通过分析KTH数据集中各类样本数量的比例分布及其对应的准确率,本论文提出使用二次训练、过取样策略和扩展数据集这三种改进方法来证明数据分布不均衡对实验结果确有影响,并以此提高识别率。最终,三种改进方法分别达到93.5%、92.8%和94.7%的平均准确率,为小样本或不均衡数据集的分类问题提供解决办法。此外,利用3DCNN进行行为识别的方法在减少特征提取工作量的同时提高了算法的鲁棒性,即改善了传统识别方法中存在的问题。
[Abstract]:In recent years, with the introduction of high-definition video equipment, artificial intelligence based on behavior recognition technology has been rapidly developed in the field of intelligent safe city, smart home and military security. Because of its wide application prospect and economic value, behavior analysis and recognition technology has become a hotspot in the field of computer vision. The traditional behavior recognition algorithms are usually divided into three steps: motion foreground detection, feature extraction and training recognition. Although the recognition rate of this method is acceptable, its robustness is not high and the workload is enormous. In addition, many factors such as occlusion between targets, complex background and uncertain shooting angle in the actual scene result in difficulty or even invalidation of traditional methods. This paper aims to improve these problems in traditional behavior recognition methods by using convolution neural network (Convolutional Neural Networks,CNN) to improve the robustness of the algorithm and improve the accuracy of recognition as much as possible. Aiming at the disadvantage that background subtraction and inter-frame differential can not extract the complete foreground without too much motion amplitude, this paper proposes a human body silhouette extraction algorithm based on Gao Si differential (Difference of Gaussian,DoG image. In this method, two subtraction images of adjacent Gao Si scale space are used to construct differential images containing human contour information, and then binary enhancement and morphological processing are performed to obtain rough human silhouette images. In the second step, the threshold is used to scan and detect the rough body silhouette area of each line, and then the complete and accurate human body silhouette image is obtained after the operations such as blocking operation. In order to fuse the temporal information of the image sequence, the human body silhouette image is accumulated in the period, and the two-dimensional feature map is generated, which is sent into the CNN for training and recognition. Finally, the average accuracy rate of 85.3% is obtained on the KTH common data set by the experiments of network parameter adjustment and 50% discount cross-validation, which proves the feasibility of the recognition framework. In order to better deal with video data, researchers extend the convolution neural network to 3 D. In this paper, 3D CNN is used to carry out experiments and it is found that the best recognition effect can be obtained by combining "optical flow graph, frame difference graph and three frame difference map". The average accuracy is 92.0% on the KTH common data set after the experiments of network parameter adjustment and 50% discount cross-validation. Secondly, by analyzing the proportional distribution of the number of samples in the KTH data set and the corresponding accuracy, this paper proposes the use of secondary training. The oversampling strategy and the extended data set are three improved methods to prove that the uneven distribution of the data has an effect on the experimental results, and thus to improve the recognition rate. Finally, the three improved methods reach the average accuracy of 93.5% and 94.7% respectively, which provide a solution to the classification problem of small sample or unbalanced data set. In addition, the method of behavior recognition using 3DCNN not only reduces the workload of feature extraction, but also improves the robustness of the algorithm, that is, it improves the problems existing in the traditional recognition methods.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41;TP183

【参考文献】