EDGE体系结构指令静态映射算法研究

发布时间：2018-04-05 07:11

本文选题：EDGE　切入点：静态映射　出处：《哈尔滨工业大学》2012年硕士论文

【摘要】：随着现代半导体工业的发展，芯片的集成度不断提高，处理器设计朝着分片式的方向发展。对处理器性能的急切需求使充分挖掘程序的指令级并行（ILP）成为一种趋势。在这种背景下，出现了显式数据流执行模型，被业界称为EDGE（Explicit Data Graph Execution）体系结构。EDGE体系结构有块原子执行、静态放置动态发射的特点。分片式的结构需要有将指令映射到硬件上的机制，如何设计这个映射方法使性能达到最优对于EDGE体系结构有非常重大的意义。本文总结了现有映射算法的优缺点并分析了对于性能有影响的各个因素，并且根据增加节点上的旁路来减少通信延时的原理提出并实现了一种相关优先放置算法，即DF(Dependenece First)算法。测试结果表明，DF调度算法比现有的最优算法性能最多提升13%，平均提升2%，该方法显著加快了应用程序的执行速度。本文还对DF算法进行了改进，形成了DF2算法。经过分析，DF算法的复杂度与空间路径调度算法（SPS）相同，均为O(i2)。DF算法在不增加算法复杂度以及硬件开销的情况下，提升了程序的执行性能。本文还将DF算法应用于不同的硬件，以探讨硬件结构与DF算法之间的关系，探寻在DF算法下处理器性能的瓶颈。本文分别将DF算法产生的代码应用于2倍旁路带宽、2倍网络带宽的硬件上。通过研究发现，，在DF算法中，旁路带宽对DF算法的性能有很大的影响。经过分析，本文认为硬件旁路带宽限制了DF算法的性能增长。并指出，与网络带宽相比，旁路带宽是影响算法性能的关键因素。使用由DF算法产生的同一个二进制代码，仅仅通过将旁路带宽加倍，本文获得了额外的10%性能提升。
[Abstract]:With the development of modern semiconductor industry, the integration of chips has been improved.The urgent need for processor performance makes it a trend to fully mine instruction-level parallel ILP programs.In this context, an explicit data stream execution model appears, which is called EDGE(Explicit Data Graph execution) architecture. Edge architecture has the characteristics of block atomic execution, static placement and dynamic emission.Split architecture requires a mechanism to map instructions to hardware. How to design this mapping method to optimize performance is of great significance for EDGE architecture.This paper summarizes the advantages and disadvantages of the existing mapping algorithms and analyzes the factors that affect the performance, and proposes and implements a related priority placement algorithm, DF(Dependenece first algorithm, according to the principle of increasing the bypass on the nodes to reduce the communication delay.The test results show that the performance of DF scheduling algorithm is up to 13% and the average increase is 2% compared with the existing optimal algorithm. This method significantly speeds up the execution speed of the application.In this paper, the DF algorithm is improved to form the DF2 algorithm.It is analyzed that the complexity of DF-algorithm is the same as that of the spatial path scheduling algorithm (SPSs), and the O(i2).DF algorithm improves the performance of the program without increasing the complexity of the algorithm and the hardware overhead.This paper also applies DF algorithm to different hardware to discuss the relationship between hardware structure and DF algorithm and to find out the bottleneck of processor performance under DF algorithm.In this paper, the code generated by DF algorithm is applied to the hardware with 2 times bypass bandwidth and 2 times network bandwidth respectively.It is found that the bypass bandwidth has great influence on the performance of DF algorithm.After analysis, this paper thinks that hardware bypass bandwidth limits the performance growth of DF algorithm.Compared with the network bandwidth, the bypass bandwidth is the key factor to affect the performance of the algorithm.Using the same binary code generated by the DF algorithm, this paper gains an additional 10% performance improvement by doubling the bypass bandwidth.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP301.6;TP332

【参考文献】