YHFT-Matrix编译器全局指令调度相关技术的研究与实现

发布时间：2018-03-19 10:57

本文选题：编译器　切入点：全局指令调度　出处：《国防科学技术大学》2013年硕士论文　论文类型：学位论文

【摘要】：Matrix DSP处理器是一款由国防科学技术大学计算机学院微电子所研发的有自主知识产权的高性能DSP处理。该处理器有较强的数据计算能力，因此可以应用于软基站无线通信、水声计算等领域。为了能够推广这款处理器，一套正确的、性能优越的编译器系统是必须的。为了使所开发的Matrix编译器性能更优，就必须要做好Matrix编译器的优化措施，特别是针对于Matrix体系结构的优化措施会更有效。本文根据Matrix体系结构的特点，提出了提出了几种适合Matrix编译器的优化措施，有的已经在Matrix编译器中实现并根据Matrix体系结构做了相应的改进，在很大程度上提高了Matrix编译器的优化性能。本文主要介绍和实现的优化措施如下： (1)基于选择调度的全局指令调度。Matrix处理器是一款能够同时发射10条指令的VLIW DSP，所以指令级的并行可以充分挖掘Matrix处理器的性能。全局指令调度能够使编译器更好的实现指令级的并行。在基于GCC选择调度的基础上，Matrix编译器中实现了正确的选择调度算法，并且根据自身体系结构改进后的算法效果更加明显。 (2)if转换。if转换能够把控制流图转换为数据流图，进而可以服务于后续的优化，特别是对于指令调度有关的优化。Matrix处理器可以支持全谓词执行的，所以为Matrix编译器开发if转换可以更好的利用Matrix体系结构的特点挖掘处理器的性能。在基于GCC if转换实现的基础上，Matrix编译器中实现了同GCC一样的几种if转换情况，，并且根据特定的应用程序添加了一些新的能够if转换的情况。通过添加if转换之后，Matrix编译器的性能得到了进一步提升，特别是在添加了一些新的能够if转换的情况之后，一些特定应用程序的执行效率有很大的提高。 (3)分支延迟调度。Matrix指令集中所有的分支指令、跳转指令、函数调用指令都有四个延迟槽。如果在程序中不对这些延迟槽进行填充，就会造成流水线的空转，浪费了硬件资源。在基于GCC分支延迟调度实现的基础上，Matrix编译器正确实现了分支延迟调度功能，并且根据Matrix体系结构改进后的分支延迟调度算法，调度效果更好，延迟槽填充更加充分。
[Abstract]:Matrix DSP processor is a kind of high performance DSP processing developed by Microelectronics Institute of National University of Science and Technology. The processor has strong data computing ability, so it can be used in soft base station wireless communication. In order to popularize this processor, a set of correct and superior performance compiler system is necessary. In order to make the developed Matrix compiler perform better, we must do a good job of optimizing the Matrix compiler. In particular, the optimization measures for Matrix architecture will be more effective. According to the characteristics of Matrix architecture, this paper puts forward several optimization measures suitable for Matrix compilers, some of which have been implemented in Matrix compilers and improved accordingly according to Matrix architecture. The optimization performance of Matrix compiler is improved to a great extent. This article mainly introduces and implements the following optimization measures:. Global instruction scheduling based on selective scheduling. Matrix processor is a VLIW DSP that can transmit 10 instructions simultaneously, so the parallelism of instruction level can fully exploit the performance of Matrix processor. Global instruction scheduling can make the compiler better. On the basis of GCC selection scheduling, the correct selection scheduling algorithm is implemented in the matrix compiler. And the improved algorithm is more effective according to its own architecture. The control flow diagram can be converted into a data flow diagram, which can serve subsequent optimizations, especially for instruction scheduling related optimizations. Matrix processors can support full predicate execution. Therefore, the development of if transformation for Matrix compiler can make better use of the characteristics of Matrix architecture to mine the performance of the processor. On the basis of the implementation of GCC if transformation, several kinds of if conversions are implemented in the Matrix compiler just like GCC. The performance of the Matrix compiler has been further improved by adding if transformations, especially after adding new ones that can be converted if. The execution efficiency of some specific applications has been greatly improved. All branch instructions, jump instructions, and function call instructions in the Matrix instruction set have four delay slots. Based on the implementation of GCC branch delay scheduling, the GCC compiler correctly implements the branch delay scheduling function, and according to the improved branch delay scheduling algorithm of Matrix architecture, the scheduling effect is better. The delay slot is more fully filled.
【学位授予单位】：国防科学技术大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP314;TP332

【参考文献】