末级高速缓存性能优化关键技术研究
[Abstract]:Modern processors generally adopt multi-level caching architecture to compensate for the increasing performance gap between processor and memory. Unlike the first level cache design for separation of instructions and data, the shared last level cache is filtered by the internal cache, resulting in access to the data locality of the last stage cache. It is relatively poor. Therefore, the traditional small capacity private first level cache management strategy is difficult to effectively use the last level cache space, which seriously affects the improvement of processor memory performance. The effective management of the last stage cache and the reduction of the last stage cache failure are of great significance to improving the overall performance of the system.
The operating system is responsible for the allocation of physical memory and the establishment of virtual address mapping relations. By modifying the physical page frame allocation strategy, the data layout in the last level cache can be affected, the locality of the data is optimized and the last stage cache failure is reduced. Compared with the traditional last level cache optimization strategy based on the hardware design and compilation technology, The method has the advantages of small hardware modification and transparent application. However, the existing operating system design does not fully consider the last stage cache optimization, and is lack of effective means to control and manage the last level cache. This paper is based on two aspects of the operating system memory management strategy design and the software and hardware coincident with the last level cache design. Research on Key Technologies of performance optimization for last level caching, the main research work and achievements are as follows:
1. a subregion software partition method is proposed to reduce the end level cache pollution. The local poor data may replace the frequently accessed data after entering the last level cache to produce the last level cache pollution problem. This method uses a local profile feedback mechanism based on the memory tracking trace to detect concurrent visits. The contaminated data region of the locality in the stored intensive program; and by modifying the physical page frame allocation strategy of the operating system to allocate the pollution data set to the smaller end level cache space. This method can protect the local good data in the last stage cache and improve the hit rate of the last stage cache. Compared with the existing Linux operating system, the average number of failure number per thousand lines MPKI reduced by 15.23%, and the performance of the program improved by 7.01%..
2. a multi core processor sharing final cache optimization method is proposed, which combines interprocess partitioning and contaminated area isolation. The concurrent process data and the different data regions in the process will share the multi core processor to share the last stage cache space, resulting in a serious shared last level cache data access conflict. Detection and discovery of the distributed data area of the application program under the different shared last level cache configuration, and set the global pollution buffer in the last level cache to map the internal pollution data regions of each concurrent process. This method can further improve multi processor multi process concurrent execution on the basis of inter process division. The experimental results show that compared with the Linux operating system and the inter process partition method RapidMRC, the overall performance of the multi core system is increased by 26.31% and 5.86%. respectively.
3. a lightweight hardware supported page granularity software to control the last level cache insertion strategy is proposed. Due to the limited memory access information, the last level cache management strategy based on hardware implementation is difficult to identify the memory access behavior of different data regions in the program and can not effectively detect and locate the local poor pollution data. The method uses the existing processor page table items to design the last level cache insertion strategy software control interface. At the same time, under the guidance of the profile information, the contaminated area data is controlled into the insertion position of the last stage cache. This method has small hardware overhead and can be inserted in the hardware of the tournament mechanism. On the basis of the strategy, the last level cache pollution is further reduced and the processor memory performance is improved. Experimental results show that compared with the existing LRU, DIP and DRRIP methods, the last level cache MPKI is reduced by 14.33%, 9.68% and 6.24%, and the average performance of the processor is increased by 8.3%, 6.23% and 4.24%., respectively.
4. a software and hardware cooperative last level cache management strategy for virtual address area is proposed. In the process of running the program, data in the continuous virtual address area are often mapped to the scattered physical page frames. The existing last stage cache performance monitor is difficult to count the data distribution and can not be used for the running time. The optimization scheme provides guidance. This method first designs an end level cache partition domain performance monitor for the virtual address space, which is used to record the last level cache access information in different data regions within the program. Secondly, an online profile analysis method supported by the sub regional performance monitor is designed and it is running at the time of operation. In the end, the last level caching software control interface is designed. Under the guidance of the profile information, the operating system configuring a reasonable bypass and insertion strategy for different data regions. The method can not be significantly increased. The experimental results show that, compared with the existing LRU, DIP and DRRIP methods, the average performance of the post processor is improved by 8.05%, 5.94% and 4.01%., respectively.
【学位授予单位】:北京大学
【学位级别】:博士
【学位授予年份】:2013
【分类号】:TP333
【相似文献】
相关期刊论文 前10条
1 ;Altera宣布为Nios Ⅱ处理器系统提供新的C语言至硬件加速工具[J];电子与电脑;2006年05期
2 ;汽车用GPS导航系统解决方案[J];世界电子元器件;2006年09期
3 徐凤英;;Quad FX反戈一击[J];新电脑;2007年02期
4 孙俊杰;;Xilinx要做处理器 推Zynx平台[J];中国电子商情(基础电子);2011年04期
5 ;请问,您到底需要多少处理器?[J];每周电脑报;1997年15期
6 ;要闻速递[J];电脑采购周刊;2001年34期
7 岳阳;;领略英特尔“超线程”技术[J];电脑采购周刊;2002年46期
8 张越;;图形工作站 升级双核 Dell Precision 670[J];个人电脑;2006年02期
9 John Goodacre;;多重处理的设计选择:多处理器或多线程技术[J];电子设计应用;2006年08期
10 付汉杰;;利用NIOS Ⅱ处理器构建节省成本的嵌入式系统[J];今日电子;2007年05期
相关会议论文 前10条
1 罗怀林;张玲玉;郑自求;;重载低速传动末级齿轮自适应的运动学和力学分析[A];面向制造业的自动化与信息化技术创新设计的基础技术——2001年中国机械工程学会年会暨第九届全国特种加工学术年会论文集[C];2001年
2 单书畅;胡瑜;李晓维;;多核处理器的核级冗余容错技术[A];第六届中国测试学术会议论文集[C];2010年
3 张晓辉;程归鹏;从明;;龙芯处理器上的TLB性能优化技术[A];2010年第16届全国信息存储技术大会(IST2010)论文集[C];2010年
4 商宇;何斌;卢中俊;杨长柱;徐荣冬;;1200mm末级动叶片开发试验研究[A];中国动力工程学会透平专业委员会2011年学术研讨会论文集[C];2011年
5 陈深龙;张玉清;;基于国家标准的风险评估方法研究[A];全国计算机安全学术交流会论文集(第二十二卷)[C];2007年
6 贺元成;罗旭东;冯彦宾;郑自求;;一种重载低速传动末级小齿轮自调位装置的设计研究[A];2005年中国机械工程学会年会论文集[C];2005年
7 杨建道;;引进型百万千瓦超超临界汽轮机低压流场分析[A];中国动力工程学会第三届青年学术年会论文集[C];2005年
8 祁舒U
本文编号:2160641
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2160641.html