基于自动访存模式分析的多OpenCL设备共享存储设计
发布时间:2018-11-04 20:37
【摘要】:OpenCL具有良好的功能移植性,是主从结构异构多设备系统的理想编程模型。然而,要充分利用整个异构系统的计算能力,程序员需要显式的分配各个设备的负载,控制设备间的数据传输等等,这些工作无疑增加了程序员的负担。本文提出了多OpenCL设备共享存储(OMSM),通过Runtime对共享存储的支持使得程序员不需要显示的控制数据传输。OMSM主要任务有两个:一个是任务划分,一个是存储管理。这两个任务能够自动化的根本原因在于OpenCL编程模型中工作组的独立性:索引空间中的工作组的独立性使得划分任务得以简化成分配不同数量的工作组,同时,使得工作组写数据区域不能重叠,从而使得工作组的访问区域较为规则。访存分析的自动化是整个系统自动化的关键。本文首先分析了工作组的访存模式,结合kernel程序的特点,提出了带约束的线性的抽象描述来刻画kernel程序工作组的访存模式。为了高效的操作抽象描述,我们设计了求交、归一化、独立变量消除、合并和求解操作,并基于LLVM开源的编译器框架实现了访存模式的自动分析工具。获取访存信息之后,OMSM的Runtime在执行时有两个阶段:一个是通过对系统内各个设备Profiling来使得负载均衡,另一个是通过段表来描述数据在多个设备间的分布情况,自动控制数据传输。实验结果表明,OMSM的对于没有间接访问的kernel有很高的适用性,同时在同构多设备和异构多设备平台上都获得了较高的性能提升。
[Abstract]:OpenCL has good portability and is an ideal programming model for master-slave heterogeneous multi-device systems. However, in order to make full use of the computing power of the whole heterogeneous system, the programmer needs to explicitly distribute the load of each device, control the data transmission between the devices and so on, which undoubtedly increases the burden on the programmer. In this paper, we propose that the shared storage (OMSM), of multiple OpenCL devices can control data transmission that programmers do not need to display through the support of Runtime for shared storage. There are two main tasks in OMSM: one is task division, the other is storage management. The fundamental reason for the automation of these two tasks is the independence of the workgroups in the OpenCL programming model: the independence of the workgroups in the index space simplifies the division of tasks into a different number of workgroups, and at the same time, So that the workgroup write data area can not overlap, which makes the access area of the working group more regular. The automation of memory access analysis is the key to the automation of the whole system. In this paper, we first analyze the memory access mode of the working group, and combine the characteristics of the kernel program, we propose a constrained linear abstract description to describe the memory access mode of the kernel program working group. In order to efficiently describe the operation abstract, we design the intersection, normalization, independent variable elimination, merging and solving operations, and implement the automatic analysis tool of memory access pattern based on LLVM open source compiler framework. After obtaining the access information, the Runtime of OMSM has two stages of execution: one is to balance the load through the Profiling of each device in the system, and the other is to describe the distribution of data among multiple devices through the segment table. Automatic control of data transmission. The experimental results show that OMSM has a high applicability to kernel without indirect access, and high performance improvement is achieved on both isomorphic and heterogeneous multi-device platforms.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
,
本文编号:2311067
[Abstract]:OpenCL has good portability and is an ideal programming model for master-slave heterogeneous multi-device systems. However, in order to make full use of the computing power of the whole heterogeneous system, the programmer needs to explicitly distribute the load of each device, control the data transmission between the devices and so on, which undoubtedly increases the burden on the programmer. In this paper, we propose that the shared storage (OMSM), of multiple OpenCL devices can control data transmission that programmers do not need to display through the support of Runtime for shared storage. There are two main tasks in OMSM: one is task division, the other is storage management. The fundamental reason for the automation of these two tasks is the independence of the workgroups in the OpenCL programming model: the independence of the workgroups in the index space simplifies the division of tasks into a different number of workgroups, and at the same time, So that the workgroup write data area can not overlap, which makes the access area of the working group more regular. The automation of memory access analysis is the key to the automation of the whole system. In this paper, we first analyze the memory access mode of the working group, and combine the characteristics of the kernel program, we propose a constrained linear abstract description to describe the memory access mode of the kernel program working group. In order to efficiently describe the operation abstract, we design the intersection, normalization, independent variable elimination, merging and solving operations, and implement the automatic analysis tool of memory access pattern based on LLVM open source compiler framework. After obtaining the access information, the Runtime of OMSM has two stages of execution: one is to balance the load through the Profiling of each device in the system, and the other is to describe the distribution of data among multiple devices through the segment table. Automatic control of data transmission. The experimental results show that OMSM has a high applicability to kernel without indirect access, and high performance improvement is achieved on both isomorphic and heterogeneous multi-device platforms.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
,
本文编号:2311067
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2311067.html