当前位置:主页 > 科技论文 > 计算机论文 >

GPGPU结构研究与性能分析

发布时间:2018-06-24 14:17

  本文选题:GPGPU + Fermi ; 参考:《吉林大学》2017年硕士论文


【摘要】:在过去的十几年里GPU处理性能的增长十分迅猛。GPU在结构上与CPU有很大的不同,在GPU中有更多的晶体管用于计算,而CPU中更多的晶体管用于逻辑控制。因此在不同的设计目的之下,他们的作用也变得不同。更近一步,GPU迅速从图像处理领域发展到通用计算领域,由此开启了一个新的领域叫做GPGPU(General-Purpose Computing on the Graphic Processing Unit)。GPGPU是为处理并行任务而设计的,所以对并行计算模型的研究是很有意义的。虽然PRAM模型、BSP模型和log P模型等经典的并行计算模型已经提出很多年,但是通过对这些模型的研究可以更加深刻的理解GPGPU结构。从GPGPU这个概念被提出开始,很多的研究集中在利用其强大的计算能力,对于处理某一问题的效率进行大幅度提升。这一现象主要原因在于芯片的详细结构、流水线以及存储设计都涉及到商业机密,很难获得这些资料用于研究。英伟达和AMD是两家主要生产GPGPU的厂家,相比较之下英伟达的官方文档更加详细,其CUDA套件也更加完备,因此本文以英伟达的芯片作为研究重点。本文选择了开源的GPGPU-Sim模拟器,对英伟达的GPU进行模拟。本文对一些并行计算模型,比如PRAM模型、BSP模型和log P模型等进行了对比研究,比较了其参数的异同以及核心思想,并且对当前GPU的研究现状做了简单综述。随后,本文给出了一个全新的NKGPGPU,对硬件结构、任务的逻辑结构、代码结构以及其中的映射关系做出了详细构架。整体上,NKGPGPU包括五个子模型,分别是硬件结构子模型、任务结构子模型、任务组织子模型、任务执行子模型以及任务调度子模型。硬件结构子模型主要给出了NKGPGPU芯片中的主要组成部件。任务组织子模型主要给出了适用于NKGPGPU的代码结构以及代码和任务之间的映射,除此之外还给出了任务之间的启动关系模型。任务执行子模型这一部分给出了代码和硬件之间的映射。任务调度子模型给出了任务拓扑结构和硬件结构的映射。同时本文给出了一个性能分析模型,使它符合本文提出的NKGPGPU。对于影响GPGPU性能的主要三个方面:GPGPU流水线、共享存储和全局存储,本文在不同线程数目的情况下进行了详细的实验。对GPGPU的流水线的实验主要是研究对于不同类型的指令的运行周期的差异,通过这个差异来判断指令与流水线之间的关系。研究共享内存和全局内存的方法类似,都是通过连续的访存指令测试完成周期。本文提出的NKGPGPU丰富了GPGPU的理论模型,为GPGPU硬件工程师和软件编程人员提供了改进的依据,对于GPGPU-Sim的实验方法和思路可以作为进一步研究GPGPU的基础。
[Abstract]:In the past decade, the processing performance of GPU has grown rapidly. The structure of GPU is very different from that of CPU. There are more transistors in GPU for computation and more transistors in CPU for logic control. Therefore, under different design purposes, their role also becomes different. With the rapid development of GPU from the field of image processing to the field of general computing, GPU (General-Purpose Computing on the graphic processing Unit) .GPGPU is designed to deal with parallel tasks, so the research of parallel computing model is very meaningful. Although the classical parallel computing models such as pram model and log P model have been proposed for many years, the structure of GPGPU can be better understood through the study of these models. Since the concept of GPGPU was put forward, many researches have focused on using its powerful computing power to greatly improve the efficiency of dealing with a certain problem. This phenomenon is mainly due to the detailed structure of the chip, pipeline and storage design are involved in trade secrets, it is difficult to obtain such information for research. Nvidia and AMD are two main manufacturers of GPGPU. Compared with Nvidia, the official documents of Nvidia are more detailed and its CUDA kit is more complete. Therefore, this paper focuses on Nvidia's chip. In this paper, the open source GPU-Sim simulator is chosen to simulate Nvidia's GPU. In this paper, some parallel computing models, such as pram model, BSP model and log P model, are compared, the differences and similarities of their parameters and their core ideas are compared, and the current research situation of GPUs is briefly summarized. Then, this paper presents a new NKGP GPU, which provides a detailed framework for hardware structure, task logic structure, code structure and mapping relationship. As a whole, NKGPU consists of five sub-models, namely, the hardware structure sub-model, the task organization sub-model, the task execution sub-model and the task scheduling sub-model. The hardware architecture sub-model mainly gives the main components of NKGPGPU chip. The task organization sub-model mainly gives the code structure and mapping between code and task which is suitable for NKGPU. In addition, the startup relationship model between tasks is also given. This part of the task execution submodel shows the mapping between code and hardware. The task scheduling submodel gives the mapping between the task topology and the hardware structure. At the same time, a performance analysis model is given to make it accord with the NKGP GPUproposed in this paper. For the three main aspects affecting GPGPU performance: GPGPU pipelining, shared storage and global storage, this paper makes a detailed experiment with different number of threads. The experiment of pipeline of GPGPU is mainly to study the difference of running cycle for different types of instruction, and judge the relationship between instruction and pipeline by this difference. The methods of studying shared memory and global memory are similar, they are completed by continuous memory access instruction testing. The NKGPGPU presented in this paper enriches the theoretical model of GPGPU and provides an improved basis for GPGPU hardware engineers and software programmers. The experimental methods and ideas for GPGPU-Sim can be used as the basis for further research on GPGPU.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TP332

【相似文献】

相关期刊论文 前7条

1 尹芳;;基于ANSYS Workbench的结构应力分析的子模型法[J];武汉轻工大学学报;2014年02期

2 谢晓丹;;深入理解CSS盒子模型[J];福建电脑;2011年07期

3 甘杜芬;吴飞燕;;CSS盒子模型定位方式的研究与应用[J];计算机光盘软件与应用;2013年06期

4 王能超;李小妹;;格子模型的序列搜索优化算法[J];小型微型计算机系统;2005年10期

5 康亚明;杨明成;;基于子模型的孔边应力集中的有限元分析[J];湖南工程学院学报(自然科学版);2005年04期

6 王安华;涂序彦;;气田生产调度多重广义算子模型[J];微计算机信息;2006年34期

7 彭云;易龙;南英;;复合材料盒段结构屈曲稳定性分析及优化技术[J];航空计算技术;2006年05期

相关会议论文 前1条

1 杨庆山;李启;;亚格子模型在钝体绕流大涡模拟中的比较[A];第十四届全国结构风工程学术会议论文集(下册)[C];2009年

相关博士学位论文 前1条

1 李应林;基于旋流强度的亚格子模型及其在不可压流动大涡模拟中的应用[D];中国科学技术大学;2015年

相关硕士学位论文 前3条

1 郭康瑞;基于子模型的蜂窝梁孔间腹板受剪屈曲承载力计算方法[D];山东大学;2017年

2 邢千里;GPGPU结构研究与性能分析[D];吉林大学;2017年

3 周宇;基于子模型的铁路车辆结构强度精细计算[D];大连交通大学;2008年



本文编号:2061805

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2061805.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f9d9e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com