面向多核系统的程序并行化方法
发布时间:2018-05-25 10:15
本文选题:多处理器系统芯片 + 并行编程 ; 参考:《浙江大学》2012年硕士论文
【摘要】:由于集成电路工艺技术的缩微化和应用需求,高性能微处理器将多个处理器单元集成到单芯片上来实现处理器性能的提升和功耗的减少。但要充分利用多核系统所提供的硬件资源,必须解决应用程序的并行化编程问题。然而,应用程序的并行化编程问题非常复杂。1)长期以来大多数程序员都采用串行的编程模型和编程语言;2)大量现存的应用程序和算法描述均是采用串行语言编写;3)面对不同的应用程序和多处理器系统结构,寻求通用的并行编程模型十分困难。因此,到目前为止研究人员还没有较好的方法来解决应用程序的并行化编程。 我们针对上述并行化编程的困难,提出一种面向C程序的编程并行化方法,由程序员指导,工具辅助,实现半自动化的开发源程序的并行性。该方法结合流应用程序的特点,通过关注多层循环结构来开发应用程序中隐藏的并行性,提出了既适用于任务并行模式,又适用于流水并行模式的并行策略。 本文提出的并行编程方法包括四个步骤。1)对源代码进行程序分析,结合典型数据运行驱动剖析和依赖分析,获取应用程序的执行开销、条件分支选择、函数调用关系、存储占用、函数访问的变量以及依赖关系等信息,建立任务依赖图模型。2)对任务依赖图模型进行变换,消除其中的控制依赖和冗余迭代间数据依赖,聚合环状依赖,以生成适于调度的有向无环图。3)进行任务调度,为任务设置两级优先级,任务映射过程中充分考虑分支任务的互斥性。采用启发式方法,通过反馈优化,逐步求精,获取并行化方案。4)封装任务,生成可执行代码,在多核平台上进行性能的评估。 我们选用T264解码程序和AES加密程序为实验对象,分别在2核、4核和8核的硬件平台上进行评估。本文提出的并行化方法在8核平台上,AES和T264相对于单核处理器分别有5.12和5.62的加速比。实验结果证明了本方法的有效性和良好的可扩展性。 本文所提出的并行化方法继承了以往的并行化研究的经验,采用启发式框架,同时兼具自己的特色:1)定义了任务依赖图(task dependence graph, TDG)模型。该模型适用于任何应用程序,能够有效表示程序中的控制依赖和数据依赖。2)针对存在错综复杂依赖关系的任务图模型提出一种有别于传统聚合消除法的变换规则。分别对分支、循环和迭代间依赖采用不同的技术,能够有效解除环状依赖。3)将动态优先级与静态优先级相结合。动态优先级保证了关键路径上的任务总能被优先处理,而静态优先级综合考虑了任务的计算开销和存储占用情况。任务到处理器的映射方法以尽量减少的执行时间为优化目标,还考虑了存在分支选择的情况。4)不仅仅对任务并行模式进行了研究,还针对多层循环结构探索了相关的流水并行技术。
[Abstract]:Due to the miniaturization and application requirements of integrated circuit technology, high-performance microprocessors integrate multiple processor units into a single chip to improve processor performance and reduce power consumption. However, in order to make full use of the hardware resources provided by multi-core systems, the parallel programming problem of application programs must be solved. However, The problem of parallel programming of applications is very complex. 1) for a long time, most programmers have adopted serial programming models and programming languages.) A large number of existing application and algorithm descriptions are written in serial language. Different applications and multiprocessor system architectures, It is difficult to find a common parallel programming model. So far, researchers do not have a good solution to parallelization of applications. In view of the difficulties of parallel programming, we propose a method of programming parallelization for C programs, which is directed by programmers and assisted by tools to realize the parallelism of semi-automatic programming source programs. Based on the characteristics of flow applications and by focusing on the hidden parallelism in applications, a parallel strategy is proposed, which is suitable for both task and pipelined parallel patterns. The parallel programming method proposed in this paper includes four steps. 1) analyzing the source code, combining with the typical data running driven analysis and dependency analysis, obtaining the execution cost of the application, the selection of conditional branches, and the function calling relationship. By storing the information of occupation, function access variables and dependency relationships, a task-dependent graph model .2) is established to transform the task-dependent graph model to eliminate the control dependency and redundant iterative data dependency, and to aggregate the ring dependency. The task scheduling is carried out by generating a directed acyclic graph. 3) and two levels of priority are set for the task. In the process of task mapping, the mutuality of branch tasks is fully considered. By using heuristic method, feedback optimization, refinement step by step, parallelization scheme. 4) encapsulation task is obtained, executable code is generated, and performance evaluation is carried out on multi-core platform. We choose T264 decoder and AES encryption program as experimental objects and evaluate them on the hardware platform of 2 core 4 core and 8 core respectively. The parallelization method proposed in this paper has speedup ratios of 5.12 and 5.62 for AES and T264 on 8-core platform, respectively. Experimental results show that this method is effective and extensible. The parallelization method proposed in this paper inherits the experience of previous parallelization research, and uses heuristic framework and has its own characteristics: 1) to define the task dependence graph, TDG) model of task dependency graph. This model is suitable for any application and can effectively represent the control and data dependencies in the program. 2) for the task graph model with complex dependencies, a transformation rule is proposed, which is different from the traditional aggregation elimination method. Different techniques are used for branch, loop and iteration dependencies, which can effectively remove the ring dependence. 3) combining dynamic priority with static priority. Dynamic priority ensures that tasks in critical paths can always be prioritized, while static priority takes into account the computational overhead and storage footprint of tasks. Task-to-processor mapping aims at minimizing execution time, and considers the presence of branching options. The related pipelining parallel technology is also explored for multi-layer loop structure.
【学位授予单位】:浙江大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP311.1;TP332
【参考文献】
相关博士学位论文 前1条
1 高丰;基于SOC的实时操作系统的研究[D];浙江大学;2002年
,本文编号:1933060
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1933060.html