面向多核微体系结构模拟的采样加速策略研究

发布时间:2018-03-14 16:40

  本文选题:微体系结构模拟 切入点:多核处理器 出处:《华中科技大学》2016年博士论文 论文类型:学位论文


【摘要】:计算机微体系结构模拟在计算机体系结构设计过程中扮演了重要角色。无论在工业界还是学术界,微体系结构模拟都是设计中必不可少的技术,因为设计者们需要利用该技术去探索广阔的设计空间,评估大量的设计方案,从而接近或达到最优设计。遗憾的是,几十年来,模拟速度缓慢一直是该技术的瓶颈,令设计者们如鲠在喉。当计算机进入多核/众核时代,模拟速度缓慢的问题更加突出,其原因大致有二:(1)结构部件更多且设计更精细的多核系统带来了更庞大的设计空间需要探索;(2)为了对多核/众核系统进行更好的评估验证与压力测试,规模更大且更复杂的多核多线程基准测试程序需要模拟。所以,多核微体系结构模拟加速的研究具有重要的学术意义与应用价值。采样模拟是一种普遍流行并被广泛使用的有效模拟加速策略。该策略通过模拟精心挑选的小部分程序样本来推断整个程序在系统中的运行性能,从而大幅缩短模拟评估周期,提高设计方案验证速度。目前,面向单核系统的采样模拟加速技术已经比较成熟。该技术依据程序运行过程中的动态指令数选取样本,例如,一个样本通常被定义为固定数量的指令。因此,该类技术叫作基于指令的采样模拟加速技术(Instruction-Based Sampling, IBS)。然而,当面向多核系统的模拟时,IBS技术效果不佳甚至会导致错误的评估,原因是多核多线程测试程序在运行过程中线程之间的同步交互会造成其运行时动态指令的数量具有不确定性,导致IBS技术失去其应用的基本原则。所以,一种基于程序执行时间的采样模拟加速技术(Time-Based Sampling, TBS)应运而生。不同于IBS, TBS技术通过选取固定长度的执行时间作为样本进行采样模拟,可以更好地完成多核系统运行多线程测试程序的性能评估。然而,相比于传统的IBS, TBS技术远未成熟,面临样本精确选择困难,单一采样策略效果不佳,功能预热代价较大等诸多具有挑战性的问题。针对这些问题,对面向多核微体系结构模拟的TBS技术展开深入的研究。首先,针对TBS技术的样本精确选择困难问题,提出利用多线程基准测试程序的分形行为来指导样本选择的采样策略PCantorSim。PCantorSim规避传统样本选取策略中的复杂预处理过程,提升了采样效率并具有广泛适用性。具体来说,PCantorSim发现多线程基准测试程序在执行过程中除了具有阶段性的周期行为之外还存在自相似性的分形行为,即程序的运行时行为特征在不同的时间尺度下的观察结果具有自相似性。基于这个发现,提出的PCantorSim采样策略可以快速精准地选取具有代表性的样本片段,大幅缩短采样模拟时间。在对PCantorSim的测试评估中,将多核基准测试程序集PARSEC中的程序运行在模拟的8核系统上,相比于未采样的全详细模拟,PCantorSim采样模拟的模拟速度提高了20倍,且测试程序的平均执行时间预测误差仅为5.3%。其次,针对单一采样策略难以充分发挥TBS的技术优势问题,提出基于分段-分形的多层采样策略THS (Two-level Hybrid Sampling). THS通过对TBS技术中多个单一采样策略的详细分析对比揭露了一系列之前尚未发现的现象。例如,(1)相比于预测详细模拟阶段的IPC (Instructions Per Cycle),准确预测快速模拟阶段的IPC更为重要;(2)快速模拟阶段的IPC预测准确性由样本选取策略以及快速模拟IPC预测算法共同决定;(3)当选取的样本片段长度较小时,基于分形的采样策略(Cantor Sampling)更准确,而当选取的样本片段长度较大时,基于分段的周期性采样策略(Periodic Sampling)更准确:(4)随机采样策略(Random Sampling)不适合应用到TBS技术中。基于这些发现,THS精心设计了基于分段-分形的多层采样策略,可以利用不同单一采样策略的优点并规避它们各自的缺点,从而更好地发挥TBS技术的性能评估准确性和模拟速度加速比优势。实验评测结果表明,THS的程序平均执行时间预测误差为4%,模拟速度加速比为40倍。对THS进一步地评估表明,它还有较高的跨微体系结构评估准确性,可以有效指导多核微体系结构设计方案的选择。最后,针对TBS技术中功能预热代价大的问题,提出实时在线的功能预热加速机制SOL (Shorter On-Line Warmup)。SOL机制采用两阶段预热设计,首先第一阶段的Prime策略选取适当长度的功能预热模拟片段,然后在第一阶段选取的预热片段内再实施经过扩展优化的NSL (No-State-Loss)预热策略,从而减少功能预热代价且保持较好预热效果。通过对SOL参数的探索调优,确定合理的功能预热参数组合,达到性能评估准确度以及模拟速度加速比的有效均衡。实验结果表明,SOL机制具有广泛适用性,可以集成到现有的多个TBS策略中,快速预热采样模拟中的功能部件,并在保持模拟精度的前提下提高模拟速度加速比。
[Abstract]:Computer microarchitecture simulation in computer architecture design process plays an important role in both industry and academia, microarchitecture simulation design is essential, because the designers need to use this technique to explore the design of broad space, evaluation of design plans of which close to or reach the optimal design unfortunately, for decades, slow simulation speed has been the bottleneck of the technology, the designers of lump in my throat. When the computer into multi-core / many core era, the problem of slow simulation speed is more prominent, the reasons are: (1) two parts more and more precise design of multi-core systems bring the larger design space needs to be explored; (2) in order to evaluate better the multi-core and many core system verification and pressure test, multithreaded benchmarks to larger and more complex The program needs to be simulated. Therefore, multi core microarchitecture simulation research has an important academic significance and applied value. The sampling simulation is a popular and effective simulation is widely used to accelerate the strategy. The operation performance of a part of the program sample carefully selected the strategy adopted by the simulation to push off the whole process in the system. Thus greatly shorten the simulation cycle, improve the design verification speed. At present, the single core system sampling simulation technology has been mature. The technology based on the number of dynamic instructions in a program sample, for example, a sample is usually defined as a fixed number of instructions. Therefore, instruction sampling simulation acceleration technology based on this kind of technology is called (Instruction-Based Sampling, IBS). However, when the simulation for multi core system, IBS technology is ineffective or even lead to wrong rating The reason is estimated, synchronous interaction between multi-core and multi thread thread testing procedures in the operation process will cause the operation number of dynamic instructions are uncertain, resulting in the basic principles of IBS technology lost its application. Therefore, a program execution time based on the sampling simulation acceleration technology (Time-Based Sampling TBS) came into being. Unlike IBS, TBS by selecting the fixed length of the execution time as a sample for sampling simulation, performance evaluation can better accomplish the multi-core system running multi-threaded test program. However, compared with the traditional IBS, TBS technology is far from mature, facing the difficult choice of sample accurate, single sampling strategies ineffective, large cost of preheating function many other challenging problems. To solve these problems, further research is carried out on multi-core micro architecture simulation technology of TBS. First of all, based on TBS Technology Sample accurate selection problem, put forward to guide the choice of sample fractal behavior using multi-threaded benchmark sampling strategies to avoid complex PCantorSim.PCantorSim pretreatment process strategy selection in the traditional sample, enhance the sampling efficiency and wide applicability. Specifically, PCantorSim found multi-threaded benchmarks in the implementation process in addition to cycle the behavior has a stage has self similar fractal behavior, namely the program running results were observed at different time scales of the behavior has self similarity. Based on this discovery, the proposed PCantorSim sampling strategy can quickly and accurately select a representative sample of fragments, greatly shorten the sampling in simulation time. The test and evaluation of PCantorSim in the multi-core benchmarks in the PARSEC program running in the simulation of 8 core systems Compared to the full, detailed simulation without sampling, PCantorSim sampling simulation simulation speed was increased by 20 times, the average execution time prediction error and the test program is only 5.3%. second, for a single sampling strategy to make full use of technical advantages of TBS, the proposed multi-layer piecewise fractal sampling strategy based on THS (Two-level Hybrid Sampling) THS. Through a number of single sampling strategy with comparative analysis in TBS revealed a series of yet to be discovered before. For example, (1) compared with the IPC simulation to predict the phase (Instructions Per Cycle), to predict the rapid simulation stage of IPC is more important; (2) the fast simulation of IPC forecast accuracy the stage of sample selection strategy and fast simulation of IPC prediction algorithm is determined; (3) when the sample is small fragment length selection, sampling strategy based on fractal (Cantor Sampling) more accurate Indeed, when the sample fragment length is large, periodic sampling strategy based on segmentation (Periodic Sampling) more accurately: (4) random sampling strategy (Random Sampling) is not suitable for the application to TBS technology. Based on these findings, THS designed a multi sampling strategy based on segmentation and shape, can be used different single sampling strategy advantages and avoid their shortcomings, in order to better play the performance evaluation accuracy of TBS technology and simulation speed-up advantage. Experimental results show that the average THS program execution time prediction error is 4% and the simulation speed of more than 40 times on THS. Further evaluation shows that it has higher the cross micro architecture evaluation accuracy, can effectively guide the multi processor micro architecture design scheme selection. Finally, according to the function of TBS technology in the high cost of preheating, a real-time power Can accelerate the mechanism of SOL (Shorter On-Line pre Warmup).SOL mechanism adopts two stage preheating function design, the first phase of the preheating Prime selection strategy for the appropriate length of the simulated fragments, and then select the preheating fragment in the first phase of the expansion in after the implementation of the optimized NSL (No-State-Loss) preheating strategy, thereby reducing the cost and maintain the good function of preheating and preheating results. By exploring the tuning of the SOL parameters, to determine the function of preheating the reasonable parameters, to achieve performance evaluation accuracy and simulation speed of speedup balance effectively. The experimental results show that the SOL mechanism has wide applicability, can be integrated into the existing multiple TBS strategy in fast warm-up in the simulation of sampling function components, and in the premise of keeping the simulation precision and improve the simulation speed.

【学位授予单位】:华中科技大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP303

【相似文献】

相关期刊论文 前10条

1 ;解析英特尔“酷睿”微体系结构 设立高能效表现新标准[J];个人电脑;2006年07期

2 马鹏;卢景芬;龚令侃;;32位嵌入式CPU的微体系结构设计[J];计算机工程;2008年S1期

3 易会战,杨学军;高性能微处理器的微体系结构能量有效性[J];计算机学报;2004年07期

4 王永文,张民选;高性能微处理器微体系结构级功耗模型及分析[J];计算机学报;2004年10期

5 庞九凤;李险峰;谢劲松;佟冬;程旭;;基于支持向量机的微体系结构设计空间探索(英文)[J];北京大学学报(自然科学版);2010年01期

6 ;肉嫩皮滑 “扣肉”第一印象[J];现代计算机(普及版);2006年08期

7 王沁;王磊;罗新强;;周期级精确的微体系结构模拟器开发环境[J];系统仿真学报;2012年11期

8 肖灿文;戴泽福;张民选;;新型适应性路由器微体系结构研究[J];计算机工程与科学;2013年11期

9 王宇;刘宏伟;;基于FPGA的微体系结构验证平台[J];智能计算机与应用;2013年03期

10 谢伦国;刘德峰;;存储级并行与处理器微体系结构[J];计算机学报;2011年04期

相关会议论文 前1条

1 李鑫;窦勇;邓林;张劲;;多核平台下事务处理类应用性能分析及评价[A];2010年第16届全国信息存储技术大会(IST2010)论文集[C];2010年

相关重要报纸文章 前2条

1 宋家雨;安腾路线图披露 高端之争愈演愈烈[N];网络世界;2007年

2 本报记者 谢作昱;核心技术自主必然经历风雨[N];中国知识产权报;2005年

相关博士学位论文 前3条

1 姜春涛;面向多核微体系结构模拟的采样加速策略研究[D];华中科技大学;2016年

2 喻之斌;处理器微体系结构模拟加速策略研究[D];华中科技大学;2008年

3 刘扬帆;硬件事务存储微体系结构及其验证研究[D];浙江大学;2012年

相关硕士学位论文 前4条

1 马志伟;1GHz向量执行部件的设计与优化[D];国防科学技术大学;2014年

2 谭霜;基于GPU微体系结构的高性能计算研究[D];国防科学技术大学;2009年

3 卢仕听;基于微体系结构分析的旁道攻击及其防御技术研究[D];复旦大学;2010年

4 侯进永;低功耗TLB设计关键技术研究[D];国防科学技术大学;2005年



本文编号:1612054

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1612054.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户64cbf***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com