并行计算普适编程模型及系统架构研究

发布时间：2018-05-09 08:48

本文选题：并行计算 + 云计算　；参考：《北京邮电大学》2012年博士论文

【摘要】：信息和数据对于任何一个行业都有着巨大的价值,然而在面对海量数据时,及时的分析和处理却是难题。在过去的十年里,随着各行业信息化程度的提高,数据量的快速增长已在很多行业中出现。为了满足及时分析处理大规模数据的需求,越来越多的领域开始尝试使用并行计算技术。在过去的5到8年的时间里,并行计算编程模型的研究和应用已经从专业领域延伸到IT、电子商务等信息化程度较高的行业。并行计算并非一项新技术,从概念的提出到今天已经历了数十年的时间,在很多专业领域已有了相当长的研究历史,并取得了很多的研究成果。但是随着应用领域的改变,技术的使用场景和需求也发生了巨大的变化,目前对基于集群资源的通用并行计算编程模型及系统的研究还很缺乏。随着越来越多行业的加入,人们对普遍适用的并行计算技术的需求会不断增长,这为通用并行计算编程模型及系统的研究带来机遇,同时也带来了挑战。近年来,对通用并行计算的研究已初具规模,提出了很多通用编程模型和系统,例如MapReduce、Dryad等,但是还有很多问题并没有得到解决。 (1)模型及系统的通用性问题。大部分模型和系统是针对单一问题的需求而提出的,所能够涵盖的问题类型有限,在使用时通常需要对待处理的问题进行转换。并且任务的处理流程已固化在系统设计中,使基于模型的程序设计缺乏灵活性。 (2)系统的扩展性问题。通用并行系统通常架构在大规模集群之上,但是系统的设计却对资源扩展问题缺乏足够的考虑,随着集群规模的不断扩大和任务量的持续增长,系统的控制核心已出现负载困难的现象。 (3)通用架构的层次定位问题。虽然以架构去管理资源、用模型去承载任务的设计可以增加集群的通用性,但是对于具体的计算模型而言却没有任何的益处。如果不能将任务的管理流程抽象,架构的通用只能局限在资源配置层次。 (4)模型应用领域的探寻。通用并行计算在海量数据处理方面展现出的优越性能,使很多问题的解决思路趋向于并行处理,然而并不是所有问题都适合并行处理,模型的应用范围值得思考。围绕上述问题,本文开展了以下工作： (1)对通用并行计算编程模型及系统展开了研究,对现有模型的优势和不足进行了分析总结,提出了功能并行、时间并行与数据并行分层叠加的并行模式设计,扩展了模型的应用范围。同时在系统设计方面,将应用的执行流程控制作为特殊任务进行设计,使应用的执行流程更多样化,应用程序的设计更灵活。这两方面的设计创新能够增强并行计算编程模型及系统的通用性。 (2)对通用并行系统的扩展性问题进行了研究,通过分析和总结了现有系统出现扩展性问题的主要原因。提出了分布式多控制点系统架构,用多控制点分布式管理取代单控制点集中式管理,优化了资源信令发送和处理机制、将并发应用的管理和调度拆分到不同控制点上,以此解决由控制点资源有限和任务负载不断增长引起的扩展性问题,从而提升系统的扩展能力。 (3)对通用系统架构进行了研究,提出了可持续扩展的集群架构。在解决系统扩展性问题的同时,使集群架构设计更适合承载通用并行计算模型。新架构实现了资源管理与任务调度相分离,同时对管理模块进行了层次化设计,使控制层也具有扩展能力。同时新架构还对任务管理进行了抽象,将通用的任务管理功能集成在架构中,而将流程定义及控制预留给任务管理结点实现。 (4)研究了通用并行计算在解决网络状态分析问题方而的应用。基于通用并行计算系统的特点,对其所适用的问题类型进行了分析。并以流量拥塞调整问题为主要研究对象,进行了并行算法设计,使用并行系统对处理过程进行加速,从而缩短问题处理时间。该研究尝试寻找一条并行处理网络状态分析问题的途径。
[Abstract]:Information and data are of great value for any industry. However, in the face of massive data, timely analysis and processing is a difficult problem. In the past ten years, the rapid growth of data has emerged in many industries as the level of information technology increases. In order to meet the need for timely analysis and processing of large data, More and more fields have begun to try to use parallel computing technology. In the past 5 to 8 years, the research and application of parallel computing programming models have been extended from professional to IT, electronic commerce and other industries with high degree of information.
Parallel computing is not a new technology. It has been a few decades since the proposal of the concept. It has a long history of research in many professional fields and has achieved many research results. However, with the change of the application field, the use scene and demand of technology have also changed greatly. The general parallel computing programming model and the research of the system are still very short. As more and more industries are joined, the demand for universal parallel computing technology is increasing. This brings opportunities to the research of general parallel computing programming model and system, and also brings challenges. In recent years, the research on general parallel computing has been studied. Research has begun to take shape, and many common programming models and systems have been put forward, such as MapReduce, Dryad, etc., but there are still many problems that have not been solved.
(1) the generality of the model and system. Most models and systems are proposed for the needs of a single problem. The types of problems that can be covered are limited and often need to be converted to the processing problems in use. And the processing flow of the task has been solidified in the system design to make the model based programming lack flexibility.
(2) the extensibility of the system. The general parallel system is usually built on a large scale cluster, but the design of the system lacks sufficient consideration for the problem of resource expansion. With the continuous expansion of the scale of the cluster and the continuous growth of the task volume, the core of the control system has appeared to have a negative load.
(3) the hierarchical location of general architecture. Although the architecture to manage resources, the design of the model to carry the task can increase the generality of the cluster, but there is no benefit for the specific calculation model. If the task management process can not be abstracted, the general framework of the architecture can only be limited to the level of resource allocation.
(4) the exploration of the application field of the model. The general parallel computing shows the superior performance in the mass data processing, so that many problems can be solved in parallel processing, but not all the problems are suitable for parallel processing. The application scope of the model is worth thinking.
Around the above problems, the following work has been carried out in this paper.
(1) the general parallel computing programming model and system are studied, the advantages and disadvantages of the existing model are analyzed and summarized, the parallel mode design of function parallel, time parallel and data parallel layer superposition is proposed, and the application scope of the model is extended. Meanwhile, the application process control is used as a special design in the system design. The special task is designed to make the application process more diverse and the application design more flexible. These two aspects of design innovation can enhance the parallel computing programming model and the universality of the system.
(2) the extensibility of the universal parallel system is studied, and the main reasons for the extensibility of the existing system are analyzed and summarized. A distributed multi control point system architecture is proposed. The distributed management of multi control points is used to replace the centralized management of single control point, and the mechanism of sending and processing resource signaling is optimized, and the concurrent application will be applied. Management and scheduling are broken down to different control points to solve the extensibility problems caused by limited resources of control points and increasing task load, so as to improve the scalability of the system.
(3) the architecture of general system is studied and a sustainable and extended cluster architecture is put forward. In solving the problem of system extensibility, the design of cluster architecture is more suitable for carrying general parallel computing model. The new architecture realizes the separation of resource management and task scheduling, and the hierarchical design of the management module, so that the control layer is also made The new architecture also abstracts the task management and integrates the general task management functions into the architecture, while the process definition and control are reserved to the task management node.
(4) the application of general parallel computing in solving the problem of network state analysis is studied. Based on the characteristics of the general parallel computing system, the types of the problem are analyzed. The problem of traffic congestion adjustment is taken as the main research object, the parallel algorithm is designed, and the processing process is accelerated by the parallel system. This study tries to find a parallel approach to network state analysis.

【学位授予单位】：北京邮电大学
【学位级别】：博士
【学位授予年份】：2012
【分类号】：TP338.6

【相似文献】