面向混合片上高速存储器的数据布局方法研究

发布时间：2018-07-06 12:03

本文选题：SPM数据分配 + Cache行为分析　；参考：《山东大学》2014年硕士论文

【摘要】：近年来,随着物联网技术和嵌入式智能设备的增长,嵌入式系统得到了迅速的发展。嵌入式技术越来越多的应用到无线通讯、智能电话、医疗技术和智能楼宇等深深的影响着人们日常生活的领域和行业中。当今的嵌入式设备对嵌入式系统的运行效率、持续运行时间、稳定性等都提出了更高的要求,因此在嵌入式系统的设计中,针对系统的计算性能和能耗的优化是需要考虑的重要问题。为了缓解CPU运算速度与内存读写速度不匹配的矛盾,计算机系统引入了片上缓存技术,而当今常用的片上静态随机存储器包括片上高速缓存(Cache)和便笺式存储器(SPM,Scratchpad Memory)已经广泛应用到嵌入式系统中。在数据密集型的程序中,内存子系统是整个系统的性能和能耗瓶颈,在高性能和高能效嵌入式系统设计中,内存子系统的优化是一个关键的考虑因素。虽然现在很多嵌入式系统中已经开始使用Cache和SPM作为片上RAM的混合设计,但许多现有SPM数据优化算法只针对纯SPM的架构,不适用于使用SPM和Cache混合存储架构。本文以片上SPM和Cache混合缓存架构为背景,围绕混合片上存储器性能和能耗优化这一主题,提出了基于Cache行为分析的混合片上高速存储器SPM和Cache数据分配优化算法。论文的主要工作包括： (1)通过研究混合SRAM架构下的SPM数据分配问题来优化嵌入式系统性能和能耗。本文提出一种基于整数线性规划的最优化解决方案,方案不但考虑数据在Cache中的访问频率,而且考虑内存块在Cache中未命中时的冲突行为,最终使用整数线性规划来求取使性能最高或能耗最低的SPM分配方案。对比纯SPM的架构,实验结果显示本文的混合存储器优化算法能更好的利用片上存储器的优势。 (2)提出一种基于数据Cache跟踪的Cache行为分析模型。本文采用并扩充了时空冲突集(TCS,Temporal Conflict Set)的理论作为精确分析Cache行为的模型。该模型与基于Cache冲突图的分析模型相比,本文模型使用TCS作为Cache分析的基础,算法对于每一次Cache未命中计算一个详细的冲突序列,通过ILP算法精确的计算出由于内存块的不同SPM分配对Cache行为造成的不同影响。 (3)为最大限度的利用SPM的优势,本文最后提出基于内存块的数组细粒度分割算法。在数组分割算法中,每个数组可以被分为多个不同的部分,有些部分会被映射到SPM中,有些被分配到外存中,这种细粒度的数组分割方法能更大程度的提高系统性能和降低系统能耗。 (4)优化方案整合到一个统一编译框架中,从ILP优化器中优化完的结果会被转换成一个链接脚本文件,这个优化脚本会重新被编译器编译成一个优化后的执行体。本文的工作基于改进的SimpleScalar仿真工具。
[Abstract]:In recent years, with the growth of Internet of things technology and embedded intelligent devices, embedded systems have been rapidly developed. Embedded technology is more and more used in wireless communication, smart phone, medical technology and intelligent building, which deeply affect the fields and industries of people's daily life. Nowadays, embedded devices have put forward higher requirements for the running efficiency, running time and stability of embedded system, so in the design of embedded system, The optimization of system performance and energy consumption is an important problem to be considered. In order to alleviate the contradiction between CPU operation speed and memory read / write speed, the technology of on-chip cache is introduced in computer system. Nowadays, on-chip static random access memory (SRAM), including on-chip cache (Cache) and note memory (SPM), has been widely used in embedded systems. In data-intensive programs, the memory subsystem is the bottleneck of the performance and energy consumption of the whole system, and the optimization of the memory subsystem is a key consideration in the design of high-performance and energy-efficient embedded systems. Although many embedded systems have started to use cache and SPM as the hybrid design of on-chip RAM, but many existing SPM data optimization algorithms are only for pure SPM architecture, not for the use of SPM and cache hybrid storage architecture. In this paper, the SPM and cache hybrid cache architecture is taken as the background, and the optimization algorithm of SPM and cache data allocation based on Cache behavior analysis is proposed around the optimization of memory performance and energy consumption on the hybrid chip. The main work of this paper is as follows: (1) the performance and energy consumption of embedded system are optimized by studying the SPM data allocation problem in hybrid SRAM architecture. In this paper, an optimization solution based on integer linear programming is proposed. The scheme not only considers the access frequency of data in cache, but also considers the collision behavior of memory block when it is missed in cache. Finally, integer linear programming is used to obtain the SPM allocation scheme with the highest performance or the lowest energy consumption. Compared with the pure SPM architecture, the experimental results show that the hybrid memory optimization algorithm can make better use of the advantages of on-chip memory. (2) A Cache behavior analysis model based on data cache tracking is proposed. In this paper, the theory of temporal conflict set (TCSC) is adopted and extended as a model for the accurate analysis of cache behavior. Compared with the analysis model based on cache collision graph, this model uses TCS as the basis of cache analysis, and the algorithm calculates a detailed conflict sequence for each cache miss. The ILP algorithm is used to accurately calculate the different effects of different SPM allocation on cache behavior. (3) in order to maximize the advantage of SPM, this paper proposes an array fine-grained segmentation algorithm based on memory block. In an array segmentation algorithm, each array can be divided into different parts, some of which are mapped to the SPM, some are assigned to external memory, This fine-grained array segmentation method can greatly improve system performance and reduce system energy consumption. (4) the optimization scheme is integrated into a unified compilation framework. The optimized results from the ILP optimizer are converted into a linked script file, which is recompiled by the compiler into an optimized execution. The work of this paper is based on the improved SimpleScalar simulation tool.
【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP333

【共引文献】