面向同构通用流多核体系结构的流核心软件模拟器设计与实现
发布时间:2018-06-23 02:53
本文选题:同构通用流体系结构 + 流核心 ; 参考:《国防科学技术大学》2012年硕士论文
【摘要】:多核体系结构正在飞速发展,从少量的复杂多核架构到面向计算密集型应用的简单众核架构,近年来,,业界又出现了CPU+GPU的异构架构,试图同时发挥两者的优势,但是该架构功耗较大,CPU和GPU的分离存储导致性能瓶颈。片内融合技术的出现解决了存储分离导致的数据通信延时问题,然而异构的融合架构使得资源无法充分利用。基于以上背景,作者所属课题组提出了同构通用流处理器体系结构:片内集成多个同构流多核,流多核可根据具体应用配置为CPU或流处理器的一部分。片内共享存储消除了CPU与流处理器分离存储带来的数据传输开销,采用64位RISC核增强可编程性,动态配置流多核的功能增大了芯片资源利用率。 在现代处理器的设计研究过程中,模拟器发挥着举足轻重的作用。在设计初期,可以根据模拟器判断体系结构设计是否满足功能要求;在完成RTL级模型编码以后,可以进行软硬件协同验证;在处理器投片前,模拟器可以为上层软件提供仿真环境,提前系统软件、编译器和应用软件的开发;同时,模拟器可以统计应用程序执行过程中的各项详细信息,有助于应用程序的优化和体系结构的研究。 出于课题组对同构通用流处理器体系结构的研究需要,以及模拟器对体系结构研究和编译器等软件开发的重要性,本文面向同构通用流处理器体系结构的基本单元MB64流核心,进行体系结构模拟器的设计与实现。 本文的主要工作包括以下三点: 1、设计实现MB64功能模拟器,能够在cross-endian情况下正确加载ELF文件,能正确执行带分支延迟槽的分支指令,进行简单的数据统计,为对应结构的编译器设计提供实验环境。 2、设计实现了MB64性能模拟器,采用基于前瞻的动态调度流水线,支持2位分支预测算法和BTB,使用tomasulo动态调度算法,精确模拟带分支延迟槽的分支指令,在错误路径恢复时,能保证延迟槽指令和分支指令以及imm特殊指令与立即数类指令的原子操作关系,进行详细的执行信息和时序信息的数据统计,为体系结构方面研究打下基础。 3、对MB64体系结构进行特性分析,针对Cache容量、相联度、BTB大小等体系结构参数进行实验,记录参数变化对性能的影响,考虑硬件开销进行折中权衡,以实现最佳参数的选择。最后利用快速推进和动态译码缓存技术,提高了MB64Sim的模拟速度。
[Abstract]:Multi-core architecture is developing rapidly, from a small number of complex multi-core architectures to simple multi-core architectures for computation-intensive applications. In recent years, the heterogeneous architecture of CPU GPU has emerged in the industry, trying to take advantage of both. However, the separation of CPU and GPU leads to performance bottleneck. The emergence of in-chip fusion technology solves the problem of data communication delay caused by storage separation. However, heterogeneous fusion architecture can not make full use of resources. Based on the above background, the author's research group proposes a general architecture of isomorphic stream processor, which integrates multiple isomorphic streams and cores, which can be configured as part of CPU or stream processor according to the specific application. In-chip shared storage eliminates the data transfer overhead caused by the separation of CPU and stream processor, and uses 64-bit RISC core to enhance the programmability. The dynamic configuration of stream multi-core function increases the utilization of chip resources. Simulator plays an important role in the design and research of modern processor. At the beginning of the design, we can judge whether the architecture design meets the functional requirements according to the simulator; after completing the RTL model coding, we can carry out the hardware and software co-verification; before the processor chip, The simulator can provide the simulation environment for the upper software, advance the development of the system software, compiler and application software, at the same time, the simulator can count the detailed information during the execution of the application. It is helpful for application optimization and architecture research. In order to meet the need of our research on the architecture of isomorphic universal stream processor and the importance of simulator to the research of architecture and the development of software such as compiler, this paper aims at the basic unit MB64 stream core of the architecture of isomorphic universal stream processor. Design and implement the architecture simulator. The main work of this paper includes the following three points: 1. The MB64 functional simulator is designed and implemented, which can load the cross-endian file correctly, execute the branch instruction with branch delay slot correctly, and carry out simple data statistics. 2. The MB64 performance simulator is designed and implemented, which adopts dynamic scheduling pipeline based on prospect, supports 2-bit branch prediction algorithm and BTBs, and uses tomasulo dynamic scheduling algorithm. The accurate simulation of branch instruction with branch delay slot can guarantee the atomic operation relationship between the delay slot instruction and branch instruction and the imm special instruction and the immediate class instruction when the error path is restored. The detailed data statistics of execution information and timing information are carried out to lay the foundation for the research of architecture. 3. The characteristic analysis of MB64 architecture is carried out. The effect of parameter changes on performance is recorded, and the trade-off between hardware overhead is considered in order to select the best parameters. At last, the simulation speed of MB64Sim is improved by using fast advance and dynamic decoding buffer technology.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP332
【参考文献】
相关期刊论文 前6条
1 桑胜田;王进祥;赵新曙;;采用动态译码缓存的高速指令集模拟器[J];计算机工程;2006年18期
2 陈芳园;张冬松;王志英;;异构多核处理器体系结构设计研究[J];计算机工程与科学;2011年12期
3 张福新;章隆兵;胡伟武;;基于SimpleScalar的龙芯CPU模拟器Sim-Godson[J];计算机学报;2007年01期
4 杨小溪;高晓彤;张为华;;若干体系结构模拟器加速技术的分析与对比[J];计算机应用与软件;2011年08期
5 喻之斌;金海;邹南海;;计算机体系结构软件模拟技术[J];软件学报;2008年04期
6 许建卫;陈明宇;杨伟;潘晓雷;郑规;赵健博;孙凝晖;;计算机体系结构模拟器技术和发展[J];系统仿真学报;2009年20期
本文编号:2055428
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2055428.html