基于GCC的Matrix2 DSP编译优化关键技术研究与实现
发布时间:2018-08-18 13:58
【摘要】:Matrix2 DSP处理器是由国防科学技术大学计算机学院微电子所设计的拥有自主知识产权的高性能64位浮点数字信号处理器,具有强大的数据运算能力、高运行速度以及强大的并行处理能力,主要应用于天气预报、图形图像处理等数字信号处理领域。为了支持基于Matrix2 DSP处理器的高级语言应用程序开发,课题组基于开源编译器GCC-4.7.0开发了Matrix2 DSP编译器。Matrix2 DSP处理器采用的是VLIW体系结构,其计算能力的发挥在很大程度上取决于编译器优化的性能。论文结合Matrix2 DSP处理器的体系结构特征和指令集特点,主要在候选功能单元分配、分支延迟槽调度以及不规则指令映射三个方面对Matrix2编译器的编译性能进行了优化改进,使得Matrix2 DSP编译器的编译性能有较大提高。本文的主要研究内容和贡献如下:设计和实现了Matrix2 DSP编译器候选功能单元分配算法。Matrix2 DSP处理器硬件不支持功能单元的分配,而是要求编译器能够从候选功能单元中为指令分配合适的执行单元。本文以GCC指令约束匹配机制为基础,提出了以指令字为基本分配单元,综合考虑当前指令候选功能单元和空闲资源情况的分配方案,并在Matrix2 DSP编译器中予以实现。候选功能单元分配算法的实现弥补了GCC的不足,有助于编译器更好挖掘指令级并行,提高了Matrix2 DSP处理器的硬件利用率和程序执行性能。设计和实现了Matrix2 DSP编译器分支延迟槽调度优化算法。Matrix2 DSP指令集中的条件分支指令、无条件分支指令、函数调用指令以及函数调用返回指令均有六个延迟槽,因此实现延迟槽的最大化填充对提升处理器性能有非常重要的意义。论文基于GCC的分支延迟槽调度,提出了以修改候选填充指令搜索区域、放宽延迟槽填充指令限制、添加调度实现函数为主要内容的分支延迟槽调度优化算法,并在Matrix2 DSP编译器中予以实现。分支延迟槽调度优化算法的实现提高了分支指令延迟槽的填充率,有效降低了因分支引起的延迟开销。设计和实现了Matrix2 DSP编译器对不规则指令映射的支持。Matrix2 DSP指令集中存在大量操作数类型不规整的不规则指令,现有GCC不支持不规则指令的映射。论文以GCC指令映射机制为基础,结合不规则指令的特征,修改了C标准算术运算类型一致性检测与转换规则,添加了RTL指令扩展器对不规则指令映射的支持,实现了Matrix2 DSP编译器对不规则指令正确、高效的映射。
[Abstract]:Matrix2 DSP processor is a high performance 64-bit floating-point digital signal processor with independent intellectual property, which is designed by Microelectronics, College of computer Science and Technology University of National Defense. High speed and powerful parallel processing ability, mainly used in weather forecast, graphics and image processing and other digital signal processing fields. In order to support the development of high-level language application based on Matrix2 DSP processor, we developed Matrix2 DSP compiler. Matrix2 DSP processor based on open source compiler GCC-4.7.0. The exertion of its computing power depends to a great extent on the performance of compiler optimization. Based on the architecture characteristics of Matrix2 DSP processor and the characteristics of instruction set, this paper optimizes the compilation performance of Matrix2 compiler in three aspects: candidate function unit allocation, branch delay slot scheduling and irregular instruction mapping. The compilation performance of Matrix2 DSP compiler is greatly improved. The main contents and contributions of this paper are as follows: design and implement the candidate function unit allocation algorithm of Matrix2 DSP compiler. Matrix2 DSP processor hardware does not support the allocation of functional units. Instead, the compiler is required to assign the appropriate execution unit to the instruction from the candidate functional unit. Based on the GCC instruction constraint matching mechanism, this paper proposes an assignment scheme which takes instruction word as the basic allocation unit and synthetically considers the current instruction candidate function unit and free resources, and it is implemented in the Matrix2 DSP compiler. The implementation of candidate functional unit allocation algorithm makes up for the deficiency of GCC, helps the compiler to mine instruction level parallelism better, and improves the hardware utilization and program execution performance of Matrix2 DSP processor. This paper designs and implements the Matrix2 DSP compiler branch delay slot scheduling optimization algorithm. Matrix2 DSP instruction set has six delay slots, including conditional branch instruction, unconditional branch instruction, function call instruction and function call return instruction. Therefore, it is very important to maximize the filling of delay slot to improve processor performance. Based on the branch delay slot scheduling of GCC, a branch delay slot scheduling optimization algorithm is proposed based on modifying candidate fill instruction search area, relaxing the restriction of delay slot filling instruction, adding scheduling implementation function as the main content. And it is implemented in Matrix2 DSP compiler. The implementation of the branch delay slot scheduling optimization algorithm improves the filling rate of the branch instruction delay slot and effectively reduces the delay overhead caused by the branch. The Matrix2 DSP compiler supports irregular instruction mapping. Matrix2 DSP instruction set contains a large number of irregular Operand types. The existing GCC does not support irregular instruction mapping. Based on the GCC instruction mapping mechanism and the characteristics of irregular instructions, this paper modifies the consistency detection and conversion rules of C standard arithmetic operation types, and adds the support of RTL instruction extender to irregular instruction mapping. The Matrix2 DSP compiler can map the irregular instructions correctly and efficiently.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP314
,
本文编号:2189683
[Abstract]:Matrix2 DSP processor is a high performance 64-bit floating-point digital signal processor with independent intellectual property, which is designed by Microelectronics, College of computer Science and Technology University of National Defense. High speed and powerful parallel processing ability, mainly used in weather forecast, graphics and image processing and other digital signal processing fields. In order to support the development of high-level language application based on Matrix2 DSP processor, we developed Matrix2 DSP compiler. Matrix2 DSP processor based on open source compiler GCC-4.7.0. The exertion of its computing power depends to a great extent on the performance of compiler optimization. Based on the architecture characteristics of Matrix2 DSP processor and the characteristics of instruction set, this paper optimizes the compilation performance of Matrix2 compiler in three aspects: candidate function unit allocation, branch delay slot scheduling and irregular instruction mapping. The compilation performance of Matrix2 DSP compiler is greatly improved. The main contents and contributions of this paper are as follows: design and implement the candidate function unit allocation algorithm of Matrix2 DSP compiler. Matrix2 DSP processor hardware does not support the allocation of functional units. Instead, the compiler is required to assign the appropriate execution unit to the instruction from the candidate functional unit. Based on the GCC instruction constraint matching mechanism, this paper proposes an assignment scheme which takes instruction word as the basic allocation unit and synthetically considers the current instruction candidate function unit and free resources, and it is implemented in the Matrix2 DSP compiler. The implementation of candidate functional unit allocation algorithm makes up for the deficiency of GCC, helps the compiler to mine instruction level parallelism better, and improves the hardware utilization and program execution performance of Matrix2 DSP processor. This paper designs and implements the Matrix2 DSP compiler branch delay slot scheduling optimization algorithm. Matrix2 DSP instruction set has six delay slots, including conditional branch instruction, unconditional branch instruction, function call instruction and function call return instruction. Therefore, it is very important to maximize the filling of delay slot to improve processor performance. Based on the branch delay slot scheduling of GCC, a branch delay slot scheduling optimization algorithm is proposed based on modifying candidate fill instruction search area, relaxing the restriction of delay slot filling instruction, adding scheduling implementation function as the main content. And it is implemented in Matrix2 DSP compiler. The implementation of the branch delay slot scheduling optimization algorithm improves the filling rate of the branch instruction delay slot and effectively reduces the delay overhead caused by the branch. The Matrix2 DSP compiler supports irregular instruction mapping. Matrix2 DSP instruction set contains a large number of irregular Operand types. The existing GCC does not support irregular instruction mapping. Based on the GCC instruction mapping mechanism and the characteristics of irregular instructions, this paper modifies the consistency detection and conversion rules of C standard arithmetic operation types, and adds the support of RTL instruction extender to irregular instruction mapping. The Matrix2 DSP compiler can map the irregular instructions correctly and efficiently.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP314
,
本文编号:2189683
本文链接:https://www.wllwen.com/falvlunwen/zhishichanquanfa/2189683.html