基于MPI的多层容错高性能云计算平台关键技术研究
发布时间:2018-05-27 09:42
本文选题:MPI + 容错 ; 参考:《武汉理工大学》2013年硕士论文
【摘要】:随着全球信息化浪潮的推进和计算机应用技术的不断迭代更新,各行业需要处理的信息量越来越大,尤其实在航空航天、海洋开发、天气预报等诸多领域,数据规模已经达到TB甚至PB级,而如何存储并处理这种规模的数据显得至关重要,为了解决这一问题,引入云计算平台这一概念。一方面,对于云计算平台而言有两个特点,一个是能分布式存储大数据,另一个特点是将视任务执行失败为正常情况;但另外一方面,许多云平台不适用于低延迟服务,并且在面对计算密集型任务时候显得效率不高,而MPI擅长计算密集型,并且通信迅速,消息传递延迟少,因而用MPI实现一个云平台则显得十分有意义。在本研究当中将主要研究如何构建并实现能够支持大数据存储存并拥有多层容错功能的MPI云平台。 针对上述问题,本文提出并实现出一个基于MPI的云平台,为了让此平台能够支持大数据存储,因而实现了一个由MySQL构建的分布式集群,并且多个MySQL节点存储不一样的数据,在此之上增加一个数据库中间件层,以便能将这些数据库节点联立在一起。而用户在使用的时候,并不需要考虑此存储架构,使用起来就和单个MySQL的效果是类似的。另外一方面,考虑到MPI自身没有提供响应的容错机制,因而本研究者设计出3层容错机制,分别是:任务失败重调度、任务的CheckPoint/Restart以及进程迁徙,并且将此容错机制独立分离出接口,以便平台开发者可以依据自身需求来定制其具体需求,也便于对此功能进行二次开发,而对于用户而言,则可以依据其实际需求来设定容错级别。 经过测试和评估,证明基于MySQL的分布式集群之上运行的数据库中间件能够处理用户的SQL请求,实现数据的查找以及基本的增删改功能,并且本平台可以很好地应对节点服务失效问题并能最终给用户反馈正确的结果。原型系统的可行性、可靠性、健壮性、高效性均达到设计预期。
[Abstract]:Along with the advance of the global informationization tide and the constant iteration of computer application technology , the amount of information to be processed by each industry is becoming more and more important , especially in the fields of aerospace , ocean development , weather forecast and so on , and how to store and process the data of this scale is very important . In order to solve this problem , the concept of cloud computing platform is introduced . In one aspect , for the cloud computing platform , there are two characteristics , one is the distributed storage big data , and the other characteristic is that the task execution failure is normal .
However , on the other hand , many cloud platforms are not suitable for low - latency services , and are inefficient in the face of computing - intensive tasks , while MPI is good at computing - intensive , and communication is fast , messaging latency is less , and it is meaningful to implement a cloud platform with MPI . In this study , we will focus on how to build and implement MPI cloud platforms that support large data storage and multi - layer fault tolerance .
In view of the above problems , a cloud platform based on MPI is proposed and implemented . In order to enable this platform to support large data storage , a distributed cluster constructed by MySQL is implemented , and a database middleware layer is added on the platform so that the database nodes can be connected together .
Through testing and evaluation , it is proved that the database middleware running on the distributed cluster based on MySQL can handle the user ' s SQL request , realize the searching of the data and the basic addition and deletion function , and the platform can well deal with the problem of the failure of the node service and finally feed back the correct result to the user . The feasibility , the reliability , the robustness and the efficiency of the prototype system reach the design expectation .
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333;TP302.8
【参考文献】
相关期刊论文 前1条
1 郑启龙;吴晓伟;房明;王昊;汪胜;王向前;;HPMR在并行矩阵计算中的应用[J];计算机工程;2010年08期
相关博士学位论文 前1条
1 谢e,
本文编号:1941532
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1941532.html