基于嵌入式多核系统的音频解码程序并行化方法
发布时间:2018-04-24 11:25
本文选题:并行编程 + 程序分析 ; 参考:《浙江大学》2015年硕士论文
【摘要】:随着多核处理器的发展,软件多线程日益成为制约程序性能的瓶颈。鉴于在嵌入式多核处理器上运行的很大一类程序是多媒体解码程序,本文研究对其进行并行化的方法。 程序并行化包括4个步骤:1)并行性分析,2)并行方案制定,3)代码生成,4)运行时管理。本文重点研究其中的并行性分析部分,包括程序结构分析和并行区域检测两个步骤。 本文的程序结构分析方法结合了动静态分析,该方法先对代码进行预编译,然后进行动态分析,把开销以注释的形式添加到源代码中,再进行静态分析,建立程序调用图,它以函数和循环为节点,以函数和循环间的调用关系为有向边,节点上的数值代表其开销。得到的程序调用图用于为后续的并行区域检测提供参考。 本文的并行区域检测方法融合了多种粒度的并行区域检测。对于数据并行,检测对于连续地址的连续读写操作;对于任务并行,检测函数之间的读写依赖关系;对于流水并行,检测循环内各个任务之间的依赖关系。该检测方法基于动态分析,能够避免静态分析所带来的保守估计。 我们使用APE和MP3解码程序作为实验对象,分别在2核和4核软件模拟器多核平台上进行了评估,分别获得了7.28和3.97的加速比,功耗比则分别为0.29和0.47,在提升速度的同时,也降低了功耗,证实了该方法的有效性和良好的可扩展性。
[Abstract]:With the development of multi-core processor, software multi-thread is becoming the bottleneck of program performance. In view of the fact that a large class of programs running on embedded multicore processors are multimedia decoding programs, this paper studies the method of parallelizing them. Program parallelization consists of four steps: 1) parallelism Analysis 2) parallel programming / 3) Code Generation / 4) runtime Management. This paper focuses on the parallelism analysis, which includes two steps: program structure analysis and parallel region detection. The method of program structure analysis in this paper combines dynamic and static analysis. The method first precompiled the code, then dynamically analyzed it, then added the overhead to the source code in the form of annotation, then static analysis, established the program call diagram. It takes function and loop as nodes, and calls between functions and loops as directed edges, and the values on nodes represent its overhead. The obtained program call diagram can be used as a reference for subsequent parallel region detection. The parallel region detection method in this paper combines multiple granularity parallel region detection. For data parallelism, continuous read and write operations for continuous addresses are detected; for task parallelism, read-write dependencies between functions are detected; for pipelined parallelism, dependencies between tasks in the loop are detected. The detection method is based on dynamic analysis and can avoid the conservative estimation brought by static analysis. Using APE and MP3 decode programs as experimental objects, we evaluated them on multi-core platforms of 2-core and 4-core software simulators respectively. The speedup ratios of 7.28 and 3.97 were obtained respectively, and the power ratios were 0.29 and 0.47, respectively. It also reduces the power consumption and proves the effectiveness and scalability of the method.
【学位授予单位】:浙江大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP332
【参考文献】
相关期刊论文 前1条
1 伊君翰;;基于多核处理器的并行编程模型[J];计算机工程;2009年08期
,本文编号:1796429
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1796429.html