处理器条件分支指令处理关键技术研究

发布时间：2018-01-10 10:10

本文关键词：处理器条件分支指令处理关键技术研究　出处：《浙江大学》2013年博士论文　论文类型：学位论文

【摘要】：随着各种应用对处理器性能的需求不断提高,超标量、超深流水线以及投机执行等技术被应用到处理器设计中以提高指令并行度,而条件分支指令由于具备条件执行及程序流控制的双重特性,对并行度造成负面影响,因此高效的条件分支指令处理是保证上述技术发挥潜能的前提。本文重点研究若干面向性能优化的条件分支处理关键技术,主要研究内容和创新点包括： 1、基于预测极性动态变换的分支预测方法研究。通过研究分支预测错误的时间局部性,提出一种基于预测极性动态变换的分支预测方法,动态监测未经极性变换的原始分支预测错误率,筛选出预测错误率高于阈值的预测错误高峰期,将高峰期内的预测极性进行变换,使变换后的最终分支预测错误率维持在较低水平,以提高整体分支预测精度。该方法可解决传统基于分支别名的预测方法无法解决的分支抖动等问题。 2、基于多层次过滤的分支预测方法研究。通过研究分支预测错误的空间局部性,提出一种基于多层次过滤的分支预测方法,将预测空间分为多个层次,动态监测各层分支预测错误率,进而将各层中集中分布的少数错误倾向性分支过滤到下一层中通进行针对性处理,降低各层预测错误率,从而提高整体预测精度。该方法可解决传统多路预测方法中各通路均需处理全部条件分支从而造成资源利用率不高的问题。 3、基于多级缓冲以及基于预测粒度自适应的并行分支预测方法研究。先提出一种基于多级缓冲的并行分支预测方法,在分支空闲周期内访问预测器,提前预取后续分支预测信息并对其进行缓存,当同时出现多条条件分支时,从缓存的信息中选取对应预测信息分配给各条分支,该方法可在小于等于8的取指带宽下实现高精度并行分支预测。随后进一步提出一种基于预测粒度自适应的并行分支预测方法,根据取指带宽和分支行为,自适应地将若干条件分支封装成指令包,以指令包作为预测粒度,并以指令包为单位维护历史信息,该方法可在任意取指带宽下实现高精度并行分支预测。 4、基于解码缓冲器复用及PC越级传输的循环加速方法研究。针对循环体特性,提出一种基于解码缓冲器复用及PC越级传输的循环加速方法,通过PC越级传输,使设计多表项解码缓冲器成为可能,进而复用该缓冲器,在循环过程中从缓冲器内向执行单元发送循环体指令,加速循环执行。并通过自循环宽发射技术,解决循环体指令分布、循环衔接、cache位宽限制等影响循环处理性能的问题。本文提出的关键技术对提高条件分支指令处理性能具有积极的理论研究意义与实际应用价值。
[Abstract]:With the increasing demand for processor performance in various applications, superscalar, ultra-deep pipeline and speculative execution techniques have been applied to processor design to improve instruction parallelism. Because of the dual characteristics of conditional execution and program flow control, conditional branch instruction has a negative effect on the degree of parallelism. Therefore, efficient conditional branching instruction processing is the premise to ensure the full potential of the above technology. This paper focuses on several key techniques of conditional branch processing oriented to performance optimization. The main research contents and innovations are as follows: 1. The branch prediction method based on dynamic transformation of predictive polarity is studied. A branch prediction method based on dynamic transformation of predictive polarity is proposed by studying the temporal localization of branch prediction errors. Dynamic monitoring of the original branch prediction error rate without polarity transformation, screening out the prediction error rate higher than the threshold value of the prediction error peak, the peak value of the prediction polarity change. The error rate of the final branch prediction after transformation is kept at a low level in order to improve the prediction accuracy of the whole branch. This method can solve the problem of branch jitter which can not be solved by the traditional prediction method based on branch alias. By studying the spatial localization of branch prediction errors, a branch prediction method based on multi-level filtering is proposed, which divides the prediction space into multiple levels. Dynamic monitoring of each layer branch prediction error rate, and then the concentrated distribution of a few of the layers of error bias branch filter to the next layer pass targeted processing, reduce the prediction error rate in each layer. This method can solve the problem that every channel has to deal with all conditional branches in the traditional multipath prediction method, which results in low utilization of resources. 3. Research on parallel branch prediction method based on multilevel buffer and adaptive prediction granularity. Firstly, a parallel branch prediction method based on multilevel buffer is proposed to access the predictor in the idle period of branch. The prediction information of subsequent branches is prefetched and cached in advance. When multiple conditional branches occur at the same time, the corresponding prediction information is selected from the cached information and assigned to each branch. This method can achieve high precision parallel branch prediction under the reference bandwidth less than 8. Then a parallel branch prediction method based on predictive granularity adaptive algorithm is proposed according to the reference bandwidth and branching behavior. Some conditional branches are encapsulated into instruction packets adaptively. The instruction packets are taken as the prediction granularity and the historical information is maintained in the units of instruction packets. This method can realize high precision parallel branch prediction under arbitrary reference bandwidth. 4. Based on decoding buffer multiplexing and PC leapfrog transmission, a cyclic acceleration method based on decode buffer multiplexing and PC leapfrog transmission is proposed. Through the PC leapfrog transmission, it is possible to design a multi-table item decoding buffer, and then multiplexing the buffer to send circular body instructions from the buffer to the execution unit during the cycle. In order to accelerate the cycle execution and solve the problems such as the distribution of loop volume instruction and the limit of cyclic link cache bit width which affect the performance of loop processing by using the technique of self-cyclic wide transmission. The key technologies proposed in this paper have positive theoretical significance and practical application value in improving the performance of conditional branching instruction processing.
【学位授予单位】：浙江大学
【学位级别】：博士
【学位授予年份】：2013
【分类号】：TP332

【参考文献】