面向动态场景理解的时空深度学习算法

发布时间：2018-06-30 19:09

本文选题：动态场景理解 + 深度学习　；参考：《电子科技大学》2017年硕士论文

【摘要】：动态场景理解是一个计算机视觉和机器学习的交叉子问题,一直以来都是一个研究的热点。本文提出了基于规则针对动态监控场景中特定事件检测的算法,并针对其存在的问题,提出了面向动态监控场景的时空深度学习算法,并将其在动态交通场景上进行了应用,实现了对方向盘转角的拟人化决策。针对动态场景理解中特定事件的检测,本文提出了一种使用基于规则动态场景理解算法。该算法通过分析特定事件的特点,为不同的事件制定针对性的检测规则,使用光流法和背景建模算法等经典计算机视觉算法,并结合根据经验设置的约束检验,实现了对应的事件检测。这套算法在监控场景的人群异常事件检测中进行应用时,对人群聚集异常检测的F-Measure达到了90.9%,而在人群逃散异常检测任务中,显示出了对光线变化的鲁棒性,在其他非学习算法几乎无法检测的光线变化剧烈的场景中F-Measure仍达到了61.24%。针对上述基于规则的方法规则制定困难且难以推广的缺点,本文使用深度学习算法,提出了一种深度学习动态场景分析算法,该算法不特别针对某类特定事件而是对多种事件普遍适用。本文通过使用多路三维卷积网络,提取出动态场景数据中丰富的高层特征,并将这些高层特征融合,用以对动态场景数据的内容进行分类,之后在该网络结合之前也有应用的经验约束,可有效地进行动态监控场景中的事件检测。在训练网络时,本文使用了预训练及微调的方式一定程度上解决了训练样本不足的问题。在微调及检测时,使用了时空分块策略提升了检测效果。在监控场景的人群异常检测中这个适用于多种事件深度学习动态场景分析算法取得了比针对特定事件专门制定规则的方法略优的效果。针对面向的动态交通场景中的决策的任务,本文将上述多路时空三维卷积网络进行了改进和应用。本文将卷积网络中的一些效果提升方法应用到网络中,构建了时空决策网络。通过从经验驾驶员的驾驶数据中学习有助于网络理解动态交通场景的特征,使得改进后的网络成功地对汽车行驶过程中方向盘转角进行了决策。最终得到的时空决策网络相对于现在通用的二维卷积神经网络的平均绝对值误差减小了0.762°。
[Abstract]:Dynamic scene understanding is an intersecting sub-problem of computer vision and machine learning, which has always been a hot research topic. In this paper, a rule-based algorithm for detecting specific events in dynamic monitoring scene is proposed. Aiming at its existing problems, a spatio-temporal depth learning algorithm for dynamic monitoring scene is proposed, and it is applied to dynamic traffic scene. The personification decision of steering wheel angle is realized. To detect specific events in dynamic scene understanding, a rule-based dynamic scene understanding algorithm is proposed in this paper. By analyzing the characteristics of specific events, the algorithm formulates specific detection rules for different events, uses classical computer vision algorithms such as optical flow method and background modeling algorithm, and combines the constraints set up according to experience. The corresponding event detection is realized. When this algorithm is applied to the detection of abnormal events of crowd in monitoring scene, the F-Measure of abnormal detection of crowd aggregation reaches 90.9, while in the task of detecting crowd escape anomaly, it shows the robustness to the change of light. F-Measure still reaches 61.24 in other non-learning algorithms where light changes dramatically, almost undetectable. In view of the disadvantages of the rule-based method, which is difficult to establish and difficult to popularize, this paper proposes a dynamic scene analysis algorithm for depth learning by using depth learning algorithm. The algorithm is applicable to a variety of events rather than a particular class of events. In this paper, the rich high-level features of dynamic scene data are extracted by using multi-channel 3D convolution network, and these high-level features are fused to classify the contents of dynamic scene data. After that, there are some application constraints before the network is combined, which can effectively detect the events in the dynamic monitoring scene. In the training of network, the methods of pre-training and fine-tuning are used to solve the problem of shortage of training samples to some extent. In fine tuning and detection, space-time block strategy is used to improve the detection effect. In the crowd anomaly detection of monitoring scene, this dynamic scene analysis algorithm is suitable for multi-event depth learning and achieves better results than the special rule making method for specific events. Aiming at the task of decision making in the dynamic traffic scene, this paper improves and applies the multi-channel spatio-temporal 3D convolution network. In this paper, some effect enhancement methods in convolution network are applied to the network, and a spatio-temporal decision network is constructed. Learning from the driving data of experienced drivers helps the network to understand the characteristics of the dynamic traffic scene and makes the improved network make a successful decision on the steering wheel angle in the course of vehicle driving. Compared with the current two-dimensional convolution neural network, the average absolute value error of the resulting spatio-temporal decision network is reduced by 0.762 掳.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41;TP181

【参考文献】