双目视觉中半全局匹配算法的硬件架构研究

发布时间：2018-04-15 21:41

本文选题：立体视觉系统 + 半全局匹配　；参考：《中国科学技术大学》2017年硕士论文

【摘要】：作为计算机视觉的重要分支,立体视觉广泛应用于生物医学、自动驾驶、人机交互等嵌入式应用。通常,立体视觉系统包含摄像机标定、图像校正、立体匹配和三维重建四个部分。立体视觉系统根据双目摄像机从不同角度拍摄的两幅图像,经过一系列处理,恢复出物体深度信息。由于拍摄场景受到光照、噪声、反光、透视失真、遮挡、重复和低纹理的影响,立体匹配成了整个系统最重要和最困难的部分,匹配精度直接影响了三维重建的效果。另外,在保证匹配精度的前提下,对高分辨率图像的实时处理是实际应用的基本需求。本文提出了一种基于加权半全局聚合的高精度、高吞吐率的视差并行、行并行的全流水线架构,并搭建了基于FPGA的硬件架构和验证模型。主要贡献如下:(1)提出了一种基于半全局立体匹配的高精度、高吞吐的硬件架构。整个架构包含64个视差并行计算、相邻两行像素并行计算和五条路径代价并行计算。采用乒乓缓存和分时复用的特定结构,来实现半全局聚合模块的全流水线设计,提升整个架构的吞吐率。(2)通常的半全局算法软件实现选取8或16条路径来进行代价聚合,但是由于部分路径不符合数据流向,需要消耗大量的资源来存储中间数据,现有的半全局硬件实现以降低精度为代价将路径数目降为4条。本文在不使用外部存储器的情况下,提出了基于五条路径聚合的自适应路径加权的半全局硬件架构,实验结果表明,视差图精度提高了 3.69%。(3)为了提高视差不连续区域的精度,本文在平滑约束的惩罚因子计算中,对参考图像进行拉普拉斯边缘增强。另外,在视差优化阶段,增加了碎片去除模块来消除视差图中的异常值,并采用修正的等角插值方法进行亚像素插值,提高三维重建的精度。(4)完成了该架构基于FPGA的硬件验证,采用Middlebury测试平台提供的图像对进行匹配精度测试,平均错误率为6.03%。该架构在Altera公司StratixV FPGA上的最大工作频率为156MHz,最大视差范围为64,吞吐率为1280 × 960/197fps(帧/秒)。在Xilinx公司的VC707开发板上工作频率为100MHz,吞吐率为1280 × 960/126fps。但是由于特制的乒乓缓存结构,消耗了大量的片上存储资源进行数据缓存。综上所述,该架构完全满足嵌入式应用高精度、实时性要求。
[Abstract]:As an important branch of computer vision, stereo vision is widely used in biomedical, autopilot, human-computer interaction and other embedded applications.Usually, stereo vision system includes four parts: camera calibration, image correction, stereo matching and 3D reconstruction.Stereo vision system based on binocular camera from different angles of two images, after a series of processing to restore the depth of the object information.Due to the influence of illumination, noise, reflection, perspective distortion, occlusion, repetition and low texture, stereo matching becomes the most important and difficult part of the whole system, and the matching accuracy directly affects the effect of 3D reconstruction.In addition, real-time processing of high-resolution images is the basic requirement of practical applications on the premise of matching accuracy.In this paper, a high precision, high throughput parallax parallel and line parallel all- architecture based on weighted semi-global aggregation is proposed, and the hardware architecture and verification model based on FPGA are built.The main contributions are as follows: (1) A high precision and high throughput hardware architecture based on semi-global stereo matching is proposed.The architecture consists of 64 parallax parallel computation, two adjacent pixels parallel computation and five path cost parallel computation.A special structure of ping-pong cache and time-sharing multiplexing is used to realize the all- design of semi-global aggregation module, which can improve the throughput of the whole architecture. (2) the usual semi-global algorithm software selects 8 or 16 paths to aggregate the cost.However, because some paths do not conform to the data flow direction, the existing semi-global hardware implementation reduces the number of paths to 4 at the cost of reducing the precision because a lot of resources are consumed to store the intermediate data.In this paper, a semi-global hardware architecture based on adaptive path aggregation is proposed without external memory. The experimental results show that the accuracy of parallax graph is improved by 3.69cm) in order to improve the accuracy of parallax discontinuous region.In this paper, Laplace edge enhancement is applied to the reference image in the calculation of penalty factor of smoothing constraint.In addition, in the parallax optimization stage, a fragment removal module is added to eliminate the outliers in the parallax map, and the modified isometric interpolation method is used to carry out sub-pixel interpolation to improve the accuracy of 3D reconstruction.) the hardware verification of the architecture based on FPGA is completed.The matching accuracy of image pairs provided by Middlebury test platform is tested, and the average error rate is 6.03.The maximum operating frequency of the architecture on Altera StratixV FPGA is 156 MHz, the maximum parallax range is 64, and the throughput is 1280 脳 960R 197fps( frame / sec).The frequency of working on the VC707 development board of Xilinx is 100MHz, and the throughput is 1280 脳 960R / 126fps.However, due to the special ping-pong cache structure, a large number of on-chip storage resources are consumed for data cache.To sum up, the architecture can meet the requirements of high precision and real-time of embedded application.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.41

【相似文献】