面向GPGPU的嵌入式平台人群计数算法的并行优化与设计
发布时间:2018-07-31 06:02
【摘要】:目前,国内城市人口的快速增长大大提高了公共场所人群聚集事件的发生概率。由人群聚集导致的踩踏、混乱等异常群体事件给人们带来了巨大的生命财产损失。如何有效监测和管理地铁、商城和广场等公共场所的人群动态信息,成为了当前亟待解决的实际问题。人群数量信息是异常群体事件的主要特征,若在事件发生前获得监控区域的人群数量信息,则可以帮助管理者及时疏导聚集的人群,有效避免异常群体事件的发生。近年来,GPU硬件性能的快速提高,使得利用GPU进行通用计算成为了数字图像算法加速的一种新途径。本文针对人群异常事件预警的需求,提出了一种面向监控视频的人群计数算法,并利用GPGPU通用计算技术对该算法的瓶颈模块进行硬件加速。首先,根据广场和通道等公共场所监控视频的特点,利用图像处理中的前景提取、边缘检测、目标识别与跟踪等技术设计和实现该人群计数算法,并对该人群计数算法的各个模块进行耗时分析,得出算法运行瓶颈模块为ViBe前景提取和Canny边缘检测。然后,利用垮平台的OpenCL异构开发框架分别对ViBe前景提取和Canny边缘检测进行并行优化设计。在ViBe前景提取并行优化设计时,采用了NDRange索引空间优化和异步执行优化方案对其模型初始化和模型更新进行GPU硬件加速。在Canny边缘检测并行优化设计时,分别利用内存访问优化、分离式卷积设计、减少内存访问次数和有限次迭代处理等方案对其图像高速滤波、梯度值和方向计算、非极大值抑制和双阀值边缘连接进行并行优化处理。对优化前后的ViBe算法和Canny算法进行性能测试,结果表明优化后的算法都能在不影响处理效果的情况下,降低耗时,提高运行效率。最后,将并行优化后的人群计数算法应用到监控系统中,并在嵌入式平台进行实现和测试。通过对监控系统整体功能对比和性能测试,结果表明系统通过OpenCL并行优化设计后,明显提高了算法耗时较高的瓶颈模块的运行效率。经过GPU硬件加速后的系统整体性能够在不影响系统功能操作和监控效果的情况下得到了45%到60%的提高。
[Abstract]:At present, the rapid growth of urban population in China has greatly increased the probability of crowd gathering in public places. The stampede, chaos and other abnormal crowd events caused by crowd gathering have brought huge loss of life and property to people. How to effectively monitor and manage the crowd dynamic information in public places such as subway, shopping mall and square has become a practical problem to be solved. The information of crowd quantity is the main characteristic of abnormal group events. If the information of crowd quantity in monitoring area is obtained before the event occurs, it can help managers to direct the crowd gathered in time and effectively avoid the occurrence of abnormal group events. In recent years, with the rapid improvement of the hardware performance of GPUs, general computing using GPU has become a new way to accelerate the digital image algorithm. In this paper, a crowd counting algorithm for surveillance video is proposed, and the bottleneck module of the algorithm is accelerated by using the general computing technology of GPGPU. Firstly, according to the characteristics of surveillance video in public places, such as square and passageway, the algorithm of crowd counting is designed and implemented by using the techniques of foreground extraction, edge detection, target recognition and tracking in image processing. The time-consuming analysis of each module of the algorithm shows that the bottleneck module of the algorithm is ViBe foreground extraction and Canny edge detection. Then, the ViBe foreground extraction and Canny edge detection are optimized by using the OpenCL heterogeneous development framework. NDRange index space optimization and asynchronous execution optimization scheme are used to accelerate the model initialization and model update in parallel optimization design of ViBe foreground extraction. In the parallel optimization design of Canny edge detection, the methods of memory access optimization, separation convolution design, reduction of memory access times and finite iterative processing are used to calculate the image high speed filtering, gradient value and direction calculation, respectively. Non-maximum suppression and double-threshold edge connection are processed by parallel optimization. The performance tests of the ViBe algorithm and the Canny algorithm before and after the optimization show that the optimized algorithm can reduce the time consuming and improve the running efficiency without affecting the processing effect. Finally, the parallel optimized crowd counting algorithm is applied to the monitoring system, and implemented and tested on the embedded platform. Through the comparison of the whole function of the monitoring system and the performance test, the results show that the system can obviously improve the running efficiency of the bottleneck module, which is time-consuming and time-consuming, after the system is designed in parallel with OpenCL. After GPU hardware acceleration, the system integrity can be improved by 45% to 60% without affecting the system function operation and monitoring effect.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:X924;TP391.41
本文编号:2154652
[Abstract]:At present, the rapid growth of urban population in China has greatly increased the probability of crowd gathering in public places. The stampede, chaos and other abnormal crowd events caused by crowd gathering have brought huge loss of life and property to people. How to effectively monitor and manage the crowd dynamic information in public places such as subway, shopping mall and square has become a practical problem to be solved. The information of crowd quantity is the main characteristic of abnormal group events. If the information of crowd quantity in monitoring area is obtained before the event occurs, it can help managers to direct the crowd gathered in time and effectively avoid the occurrence of abnormal group events. In recent years, with the rapid improvement of the hardware performance of GPUs, general computing using GPU has become a new way to accelerate the digital image algorithm. In this paper, a crowd counting algorithm for surveillance video is proposed, and the bottleneck module of the algorithm is accelerated by using the general computing technology of GPGPU. Firstly, according to the characteristics of surveillance video in public places, such as square and passageway, the algorithm of crowd counting is designed and implemented by using the techniques of foreground extraction, edge detection, target recognition and tracking in image processing. The time-consuming analysis of each module of the algorithm shows that the bottleneck module of the algorithm is ViBe foreground extraction and Canny edge detection. Then, the ViBe foreground extraction and Canny edge detection are optimized by using the OpenCL heterogeneous development framework. NDRange index space optimization and asynchronous execution optimization scheme are used to accelerate the model initialization and model update in parallel optimization design of ViBe foreground extraction. In the parallel optimization design of Canny edge detection, the methods of memory access optimization, separation convolution design, reduction of memory access times and finite iterative processing are used to calculate the image high speed filtering, gradient value and direction calculation, respectively. Non-maximum suppression and double-threshold edge connection are processed by parallel optimization. The performance tests of the ViBe algorithm and the Canny algorithm before and after the optimization show that the optimized algorithm can reduce the time consuming and improve the running efficiency without affecting the processing effect. Finally, the parallel optimized crowd counting algorithm is applied to the monitoring system, and implemented and tested on the embedded platform. Through the comparison of the whole function of the monitoring system and the performance test, the results show that the system can obviously improve the running efficiency of the bottleneck module, which is time-consuming and time-consuming, after the system is designed in parallel with OpenCL. After GPU hardware acceleration, the system integrity can be improved by 45% to 60% without affecting the system function operation and monitoring effect.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:X924;TP391.41
【参考文献】
相关期刊论文 前1条
1 周治平;许伶俐;李文慧;;特征回归与检测结合的人数统计方法[J];计算机辅助设计与图形学学报;2015年03期
相关硕士学位论文 前1条
1 俞嫣琰;视频摘要算法研发及GPU优化[D];浙江大学;2016年
,本文编号:2154652
本文链接:https://www.wllwen.com/kejilunwen/anquangongcheng/2154652.html