基于FPGA的并行加速实验平台原型设计与实现

发布时间：2018-03-16 01:20

本文选题：PCI　切入点：Express　出处：《山东大学》2013年硕士论文　论文类型：学位论文

【摘要】：近年来,随着物联网等新概念的提出和计算机技术的进步,嵌入式系统正以前所未有的速度发展,各种新型的嵌入式设备不断涌现；而且这些新出现的设备对智能化和实时性的要求越来越高,因此需要的运算量也越来越大。但是,传统的嵌入式处理器由于受性能、频率等方面的限制,单个的处理器已经在很大程度上没法满足需求。如果采用多个嵌入式处理器来提高处理速度,其功耗必将会大大增加,对能量有限的嵌入式设备而言,这也是不合适的。在这种情况下,现场可编程逻辑门阵列(Field Programmable Gate Array, FPGA)加嵌入式处理器的异构体系架构成为了解决上述问题的一个理想方案之一。目前基于FPGA的并行加速模型可谓多种多样,针对具体的算法采用FPGA作为协处理器进行并行加速研究也是学术界的热点之一。但是通常,将算法采用FPGA进行并行加速后,多采用仿真和分析得到加速效果,缺少实际的板级测试,这主要是因为算法测试中需要与主控制器之间进行大量而且快速的数据交换,但是目前尚缺少这样的数据交换平台,因此急需这样一个可以进行高速数据交换的并行加速实验平台,用于加速效果的板级测试。本文设计了一个并行加速实验平台原型。为达到数据交换速度要求,该平台采用PCI Express总线与主控制器进行数据交换,为加速数据传输,采用了DMA传输的方式。文中给出了实验平台的总体设计及实现步骤和方法。采用自上而下的模块化设计模式,将平台分为了PCI Express端点控制器模块、PCI Express事物层报文处理及DMA控制模块、存储控制器模块、并行加速实验模块和并行加速模块与存储器控制器之间的接口模块。作为整个平台的核心模块,PCI Express事务层报文处理及DMA控制器模块逻辑复杂,子模块众多,本文中重点介绍了该模块的详细设计和实现过程,将其划分为发送部件、接收部件、DMA控制器、读请求封装器、发送数据仲裁及准备模块、接收数据分发模块、DMA与存储器控制器接口模块和DMA与并行加速模块接口等子模块分别实现。同时也给出了其他模块的设计实现过程。然后以排序算法为例,介绍了并行排序加速器的实现,以此为基础,设计实现了并行加速模块,从而完成了整个实验平台的设计实现。本文最后对上述设计实现的平台进行了测试,给出了平台的实际资源占用、最大交换速度及实际加速效果等数据。通过实验证明,该平台满足并行加速实验的要求,可以进行算法并行加速的板级测试和实验。
[Abstract]:In recent years, with the introduction of new concepts such as the Internet of things and the progress of computer technology, embedded systems are developing at an unprecedented speed, and a variety of new embedded devices are emerging. Moreover, these new devices require more and more intelligentization and real-time performance, so they need more and more computation. However, the traditional embedded processors are limited by performance, frequency and so on. A single processor has largely failed to meet the requirements. If multiple embedded processors are used to increase processing speed, the power consumption will be greatly increased for embedded devices with limited energy. In this case, the heterogeneous architecture of Field Programmable Gate Array (FPGA) with embedded processors has become one of the ideal solutions to the above problems. At present, there are various parallel acceleration models based on FPGA, and it is also one of the hot topics in academic circles to use FPGA as a coprocessor for specific algorithms. But usually, FPGA is used to accelerate the algorithm in parallel. Simulation and analysis are often used to get accelerated results and lack of actual board level testing, which is mainly due to the need for a large amount of and fast data exchange between the algorithm test and the main controller, but there is still a lack of such a data exchange platform. Therefore, such a parallel acceleration experiment platform for high speed data exchange is urgently needed, which can be used to test the acceleration effect at board level. In this paper, a prototype of parallel acceleration experiment platform is designed. In order to meet the requirement of data exchange speed, the platform uses PCI Express bus to exchange data with the main controller. The DMA transmission mode is adopted. The overall design, implementation steps and methods of the experimental platform are given, and the top-down modular design mode is adopted. The platform is divided into PCI Express endpoint controller module, PCI Express transaction layer message processing module and DMA control module, and storage controller module. The parallel acceleration experiment module and the interface module between the parallel acceleration module and the memory controller. As the core module of the whole platform, the logic of the transaction layer message processing and the DMA controller module of the DMA controller are complex, and the sub-modules are numerous. This paper focuses on the detailed design and implementation of the module, which is divided into sending parts, receiving components of DMA controller, reading request wrapper, sending data arbitration and preparation module. The interface module of receiving data distribution module and memory controller and the interface module of DMA and parallel acceleration module are implemented respectively. At the same time, the design and implementation of other modules are also given. Then, the sorting algorithm is taken as an example. This paper introduces the implementation of parallel sorting accelerator, designs and implements the parallel acceleration module based on it, and completes the design and implementation of the whole experimental platform. Finally, the above design and implementation platform is tested in this paper. The actual resource occupation, maximum exchange speed and actual acceleration effect of the platform are given. It is proved by experiments that the platform can meet the requirements of parallel acceleration experiments and can be tested and experimented on board level with parallel algorithm acceleration.
【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP368.1;TP338.6

【参考文献】