基于龙芯平台的并行化动态二进制翻译中无锁队列的研究
发布时间:2018-11-19 21:17
【摘要】:近年来,主流的桌面和服务器软件均基于x86平台开发,但是龙芯是基于MIPS指令集的处理器,因此,如何让现有x86平台的软件兼容MIPS架构,成为了国产芯片发展的重要问题。二进制翻译是实现x86软件与龙芯cpu兼容的一种重要方法,目前龙芯平台上主要使用QEMU作为全系统模拟器,它已能通过二进制翻译技术将windows XP操作系统运行在龙芯平台上,但其性能有待提高。 在处理器频率发展步入2GHz以后,凭借提升处理器效率而带来的性能提升愈加有限,为了保证摩尔定律继续有效,多核处理器已经成为潮流。但是现有的全系统模拟是串行模拟,只使用了主机处理器单个核心的资源,因此全系统模拟的并行化迫在眉睫。真正实现并行的系统级模拟,会大大提升机器的速度和性能,最终实现国产芯片的商业产业化。 如何让龙芯平台在模拟x86时能发挥其核心数量优势,渐渐成为研究的重点。目前基于QEMU的并行化全系统模拟在业内已经有人开始研究,比如PQEMU, HQEMU, COREMU都在不同角度对QEMU进行并行化研究,但是这些并行的全系统模拟器都没用使用龙芯平台作为宿主机器。本文主要分析了QEMU的运行原理,QEMU对SMP机器的模拟原理以及现有的并行化QEMU的实现手段。本文选择将QEMU对SMP机器的模拟由串行改为并行,把QEMU对SMP机器不同核的模拟逻辑封装到不同的线程里,由操作系统对这些线程进行调度,使这些线程并发地执行在龙芯的多个核上,从而达到多核龙芯模拟多核X86机器的目的。这种并行化方法需要解决两个关键问题:原子指令翻译问题和SMP机器的中断模拟问题。本人所在课题组曾提出过一种基于gcc内置原子操作函数的原子指令翻译方案,但是经本人研究发现这种翻译方案存在问题,一是对简单指令翻译的膨胀都超过复杂指令,二是在解决非对齐原子指令时,不能完备地应对所有可能出现的情况。本文提出了一种新的直接使用MIPS的11/sc指令对的原子指令翻译方案,该方案不存在冗余操作,并且能完备地解决非对齐原子指令翻译问题。本文使用了linux实时信号与FIFO队列来实现对SMP机器中断的模拟。为保证中断模拟效率,有必要使用无锁技术实现FIFO队列。本文根据MIPS的11/sc指令对的特点,以及中断模拟中无锁队列的特性,提出了一种能够避免ABA问题的无锁队列的算法,大大提高了中断模拟效率。最终QEMU能并行地运行在龙芯3A平台下,达到了充分利用龙芯主机平台核心数量优势的目的。
[Abstract]:In recent years, the mainstream desktop and server software are developed on x86 platform, but Godson is a processor based on MIPS instruction set. Therefore, how to make the existing x86 platform software compatible with MIPS architecture has become an important issue in the development of domestic chips. Binary translation is an important method to realize the compatibility between x86 software and Godson cpu. At present, QEMU is mainly used as the whole system simulator on the Godson platform. It has been able to run the windows XP operating system on the Godson platform through binary translation technology. But its performance needs to be improved. After the development of processor frequency into 2GHz, the performance improvement caused by improving processor efficiency is more limited. In order to ensure that Moore's law continues to be effective, multi-core processor has become a trend. However, the existing full-system simulation is serial simulation, only uses the host processor single core resources, so the parallelization of the whole system simulation is urgent. The real realization of parallel system level simulation will greatly improve the speed and performance of the machine and finally realize the commercial industrialization of domestic chips. How to make the Longson platform in the simulation of x86 can give play to its core number advantage, gradually become the focus of research. At present, parallelization simulation based on QEMU has been studied in the industry. For example, PQEMU, HQEMU, COREMU has parallelized QEMU from different angles. But none of these parallel system simulators use the Godson platform as the host machine. This paper mainly analyzes the running principle of QEMU, the simulation principle of SMP machine by QEMU and the implementation of parallel QEMU. In this paper, we choose to change the simulation of SMP machines from serial to parallel by QEMU, encapsulate the simulation logic of different cores of SMP machines by QEMU into different threads, and schedule these threads by the operating system. These threads are executed concurrently on several cores of the dragon core, so that the multi-core dragon core simulates the multi-core X86 machine. This parallelization method needs to solve two key problems: atomic instruction translation and SMP machine interrupt simulation. My team has proposed an atomic instruction translation scheme based on gcc's built-in atomic op-operation function. However, I have found that there are some problems in this translation scheme. One is that the expansion of simple instruction translation exceeds that of complex instruction. Second, when dealing with unaligned atomic instructions, it is not able to deal with all possible situations. In this paper, a new atomic instruction translation scheme using MIPS's 11/sc instruction pair is proposed. The scheme has no redundant operation and can solve the problem of unaligned atomic instruction translation completely. In this paper, linux real-time signal and FIFO queue are used to simulate SMP machine interrupt. In order to ensure the efficiency of interrupt simulation, it is necessary to implement FIFO queue using lock-free technology. According to the characteristics of 11/sc instruction pair of MIPS and the characteristics of unlocked queue in interrupt simulation, this paper presents an algorithm of unlocked queue which can avoid the ABA problem, which greatly improves the efficiency of interrupt simulation. Finally, the QEMU can run in parallel on the Godson 3A platform, which can make full use of the core number advantage of the Godson host platform.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP332;TP391.2
本文编号:2343452
[Abstract]:In recent years, the mainstream desktop and server software are developed on x86 platform, but Godson is a processor based on MIPS instruction set. Therefore, how to make the existing x86 platform software compatible with MIPS architecture has become an important issue in the development of domestic chips. Binary translation is an important method to realize the compatibility between x86 software and Godson cpu. At present, QEMU is mainly used as the whole system simulator on the Godson platform. It has been able to run the windows XP operating system on the Godson platform through binary translation technology. But its performance needs to be improved. After the development of processor frequency into 2GHz, the performance improvement caused by improving processor efficiency is more limited. In order to ensure that Moore's law continues to be effective, multi-core processor has become a trend. However, the existing full-system simulation is serial simulation, only uses the host processor single core resources, so the parallelization of the whole system simulation is urgent. The real realization of parallel system level simulation will greatly improve the speed and performance of the machine and finally realize the commercial industrialization of domestic chips. How to make the Longson platform in the simulation of x86 can give play to its core number advantage, gradually become the focus of research. At present, parallelization simulation based on QEMU has been studied in the industry. For example, PQEMU, HQEMU, COREMU has parallelized QEMU from different angles. But none of these parallel system simulators use the Godson platform as the host machine. This paper mainly analyzes the running principle of QEMU, the simulation principle of SMP machine by QEMU and the implementation of parallel QEMU. In this paper, we choose to change the simulation of SMP machines from serial to parallel by QEMU, encapsulate the simulation logic of different cores of SMP machines by QEMU into different threads, and schedule these threads by the operating system. These threads are executed concurrently on several cores of the dragon core, so that the multi-core dragon core simulates the multi-core X86 machine. This parallelization method needs to solve two key problems: atomic instruction translation and SMP machine interrupt simulation. My team has proposed an atomic instruction translation scheme based on gcc's built-in atomic op-operation function. However, I have found that there are some problems in this translation scheme. One is that the expansion of simple instruction translation exceeds that of complex instruction. Second, when dealing with unaligned atomic instructions, it is not able to deal with all possible situations. In this paper, a new atomic instruction translation scheme using MIPS's 11/sc instruction pair is proposed. The scheme has no redundant operation and can solve the problem of unaligned atomic instruction translation completely. In this paper, linux real-time signal and FIFO queue are used to simulate SMP machine interrupt. In order to ensure the efficiency of interrupt simulation, it is necessary to implement FIFO queue using lock-free technology. According to the characteristics of 11/sc instruction pair of MIPS and the characteristics of unlocked queue in interrupt simulation, this paper presents an algorithm of unlocked queue which can avoid the ABA problem, which greatly improves the efficiency of interrupt simulation. Finally, the QEMU can run in parallel on the Godson 3A platform, which can make full use of the core number advantage of the Godson host platform.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP332;TP391.2
【参考文献】
相关期刊论文 前6条
1 王焕东;高翔;陈云霁;胡伟武;;龙芯3号互联系统的设计与实现[J];计算机研究与发展;2008年12期
2 王博;尚世锋;武永卫;郑纬民;;多核体系下的并行任务构建[J];计算机研究与发展;2012年04期
3 蔡嵩松;刘奇;王剑;刘金刚;;基于龙芯处理器的二进制翻译器优化[J];计算机工程;2009年07期
4 廖银;孙广中;姜海涛;靳国杰;陈国良;;动态二进制翻译中全寄存器直接映射方法[J];计算机应用与软件;2011年11期
5 殷金彪;宋强;;动态二进制翻译器qemu的Tcache管理策略[J];计算机应用与软件;2012年09期
6 宋克鑫;陈香兰;陈华平;王篁;;动态二进制翻译的多核并行化中原子指令的翻译研究[J];计算机应用与软件;2013年11期
相关博士学位论文 前1条
1 廖银;动态二进制翻译建模及其并行化研究[D];中国科学技术大学;2013年
,本文编号:2343452
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2343452.html