OpenACC 2.0性能可移植性分析

发布时间:2021-09-17 16:48
  在高性能计算领域,应用的性能可以伴随着处理器的“摩尔定律”而提升,编程者不需要改写代码便能获得处理器性能提升所带来的“免费午餐”。现如今这一趋势已经由于主频的功耗瓶颈而无法延续下去,于是在高性能计算中兴起了采用加速器的异构并行计算实现计算性能的进一步提升。Open ACC是一种基于指导语句的异构并行编程标准,能使编程者脱离异构加速器的复杂底层架构进行编程,简化了异构并行的难度。此外,Open ACC编译器能通过这一高层的编程模型生成不同平台上的并行代码,从而使采用Open ACC编写的应用具备了优良的跨平台性。在Top500榜单上的不少超级计算机都已大量采用异构并行计算作为其性能的来源,例如天河二号、Titan和TSUBAME 2.5。高性能计算的应用开发者面临着在采用不同加速器设备的超级计算机上需要编写不同的代码进行并行计算,例如CUDA、Open CL和Open MP。而Open ACC则基于上述的优点,成为解决这一问题的一种简便方法。本文主要针对Open ACC在NVIDIA Kepler架构的GPU和Intel Knights Corner架构的协处理器上的应用性能可移植性进行... 

【文章来源】:上海交通大学上海市 211工程院校 985工程院校 教育部直属院校

【文章页数】:90 页

【学位级别】:硕士

【文章目录】:
摘要
ABSTRACT
List of Abbreviations
第一章 Introduction
    1.1 High Performance Computing (HPC)
    1.2 Problem de?nition
        1.2.1 Portability
        1.2.2 Productivity
        1.2.3 Performance
    1.3 Objectives
    1.4 Summary of contributions
    1.5 Related works
        1.5.1 The investigated work on GPU
        1.5.2 The investigated work on MIC and a Hybrid system
第二章 Programming in Heterogeneous System
    2.1 The Architectures for Heterogeneous System
        2.1.1 Graphics Processing Units (GPUs)
        2.1.2 Intel Many Integrated Core (MIC)
    2.2 Programming languages and frameworks
        2.2.1 Parallel Thread Execution (PTX)
        2.2.2 Open ACC
        2.2.3 What is HMPP Codelet and how to get HMPP codelet, PTX,and MIC machine code files
        2.2.4 Comparison of Open ACC and low-level programming model
        2.2.5 Open ACC 2.0
第三章 Methodologies and Tuning techniques
    3.1 Open CL optimization
    3.2 Open ACC optimizations
        3.2.1 Independent optimization
        3.2.2 Directive organize optimization
        3.2.3 ILP optimization
        3.2.4 Grid Thread Mapping optimization
        3.2.5 Compiler ?ags
        3.2.6 Tiling optimization
第四章 Experimental setup
    4.1 Testbed machine
        4.1.1 π Supercomputer
    4.2 Banchmarks
        4.2.1 Rodinia benchmark suite and selected benchmarks
        4.2.2 Hydro Benchmark
        4.2.3 EPCC Benchmark
    4.3 Pro?ling tools
第五章 Results and Discussion
    5.1 Discussion of each optimizations
        5.1.1 Directive organize optimization and Simple restructuring
        5.1.2 Independent optimization
        5.1.3 ILP optimizations
        5.1.4 Grid thread mapping optimization
        5.1.5 Tiling optimization
        5.1.6 Compiler ?ags
    5.2 Analyze each applications
        5.2.1 Breadth First Search (BFS)
        5.2.2 Gaussian Elimination (GE)
        5.2.3 Back Propagation (BP)
        5.2.4 LU Decomposition (LUD)
        5.2.5 Hydro Benchmark
第六章 Conclusion
第七章 Future Work
参考文献
Publication



本文编号:3399127

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/3399127.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户d6f44***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com