EDGE体系结构指令动态映射算法研究
[Abstract]:The lumped structure widely existing in scrambled superscalar processors has seriously restricted the performance improvement of microprocessors. Edge (Explicit Data Graph Execution) is one of the models to deal with the bottleneck of microprocessor performance enhancement. The lumped structure with large energy consumption in superscalar is abandoned from the structural model. In a distributed EDGE architecture, instructions are mapped to multiple slices to execute simultaneously. The transmission of operands between slices requires delay, which results in performance degradation. The instruction mapping algorithm tries to eliminate the performance loss caused by fragmentation by carefully weighing the program parallelism and inter-slice communication delay. The TRIPS microprocessor adopts asymmetric distribution of critical resource topology and static reference. Mapping algorithm (SPDI, Static Placement Dynamic Issue). This will lead to a large load imbalance and Operand network communication hot spots on the ET (Execute Tile), thus causing a decrease in IPC. In this paper, a EDGE structure similar to TRIPS is implemented in the M5-EDGE simulator to study the instruction dynamic Deep mapping algorithm. In the absence of compiler scheduling, the Deep algorithm using cyclic mapping is 85% of SPDI and 98.3% of SPDI when the transmission width is 1 and 2, respectively. According to the topological position of RT (Register Tile) and DT (Data-cache Tile), three kinds of optimization of Deep mapping are carried out: according to the order of et numbering, the glyph order of "its" and the sum of calculating the number of leapfrogging steps in the global communication of very block to select ETs first. When the launch width is 1, the average jump steps of the three optimizations are 2.63% and 4.70% less than those of the basic Deep algorithm, respectively, while the IPC increases by 1.07% and 2.11%, respectively. This shows that optimizing the jump number of inter-instruction communication under Deep mapping can significantly increase the number of jump steps. In the Deep mapping algorithm, more than 90% of the operands are transferred by the optograph bypass, which greatly reduces the load of the operands network. When the bypass width is 2 times the transmit width, the local Operand transfer delay is almost reduced to 0. 0. Increasing the local bypass width can effectively reduce the delay of Operand transfer. RT is assigned to et by number, and the IPC of basic Deep mapping algorithm increases by 1.77. For the DT position optimization, the et near DT and the sum of calculated VBS hops are selected first. These two optimizations are 1.17% and 1.89% higher than the basic Deep mapping IPC, respectively. The RT and DT are tiled into the et to form the topological structure of 4x4. When the emission width is 1 and 2, the IPC of Deep map is 97.18% of SPDI and 113.42% of SPDI, respectively. The ratio of ETs was 97.32% and 114.06% respectively. When the topology distance becomes smaller or the Deep mapping algorithm optimizes the number of communication hops, the system IPCs can be improved significantly.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP332;TP301.6
【共引文献】
相关期刊论文 前10条
1 裴颂文;吴小东;唐作其;熊乃学;;异构千核处理器系统的统一内存地址空间访问方法[J];国防科技大学学报;2015年01期
2 杨文顶;覃志东;;基于NoC的众核处理器可靠性仿真分析研究[J];智能计算机与应用;2015年02期
3 刘东;张进宝;廖小飞;金海;;面向混合内存体系结构的模拟器[J];华东师范大学学报(自然科学版);2014年05期
4 谢子超;佟冬;黄明凯;;A General Low-Cost Indirect Branch Prediction Using Target Address Pointers[J];Journal of Computer Science and Technology;2014年06期
5 李凌达;陆俊林;程旭;;Retention Benefit Based Intelligent Cache Replacement[J];Journal of Computer Science and Technology;2014年06期
6 李笑天;殷淑娟;何虎;;一种DSP周期精度高效建模方法[J];计算机应用研究;2015年01期
7 刘雨辰;王佳;陈云霁;焦帅;;计算机系统模拟器研究综述[J];计算机研究与发展;2015年01期
8 黄明凯;刘先华;谭明星;谢子超;程旭;;一种面向解释器的间接转移预测技术[J];计算机研究与发展;2015年01期
9 黄永兵;陈明宇;;移动设备应用程序的体系结构特征分析[J];计算机学报;2015年02期
10 杨群;李笑天;何虎;;面向Superscalar与VLIW混合架构处理器的调试器设计[J];计算机应用与软件;2015年05期
相关博士学位论文 前2条
1 章铁飞;基于程序访存模式的存储系统节能技术研究[D];浙江大学;2013年
2 修思文;MPSoC性能估计技术研究[D];浙江大学;2015年
相关硕士学位论文 前10条
1 王勋;面向非易失存储器PCM的节能技术研究[D];浙江工业大学;2013年
2 辛愿;面向嵌入式系统的自调数据预取[D];浙江大学;2013年
3 胡妍;结合结构级和门级的多核处理器功耗评估方法[D];湖南大学;2013年
4 刘雨辰;基于多维数组的高速片上网络模拟器的设计与实现[D];内蒙古大学;2014年
5 单磊;大规模并行片上系统的分布式并行模拟关键技术研究[D];国防科学技术大学;2012年
6 佘超杰;基于多核的片上网络低延迟与低功耗的研究[D];北京工业大学;2014年
7 艾天鹏;基于通讯感知的片上网络加速机制研究[D];浙江工业大学;2014年
8 陆yN;基于计算模型的体系结构模拟器研究[D];复旦大学;2013年
9 张浪;面向异构集成的NoC路由算法研究[D];武汉理工大学;2014年
10 缪旭阳;复杂体系结构的计算特征分类研究[D];武汉理工大学;2014年
本文编号:2141553
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2141553.html