云计算中分布式JobTracker节点模型的建立与优化
[Abstract]:Cloud Computing is the fourth IT industrial revolution with the development of large-scale computers, personal computers and the Internet. Google first defined and developed cloud computing. Hadoop, the open source model of cloud computing, is an open source distributed computing platform based on Java, which runs distributed and intensive applications. The single-point problem causes the bottleneck of Hadoop performance. For the single-node optimization of namenode nodes in storage model architecture (HDFS), Hadoop2.0 proposed a multi-node high-availability scheme, but there is no corresponding solution for single-node optimization of JobTracker nodes. In this paper, a distributed JobTracker node model is expected to improve the single JobTracker node failure in the traditional computing model architecture, so that the job failure caused by the single JobTracker node failure can be avoided automatically. The main contents and contributions of this paper are as follows: in this paper, the improvement of single JobTracker node model and the optimization of scheduling algorithm and load balancing algorithm are fully analyzed. Firstly, the distributed JobTracker node model is established by studying the shortest path algorithm (Dijkstra), the web weight judgment algorithm (PageRank) and the web page de-duplication algorithm (Bloom Fliter). The communication mode between many-to-many nodes in distributed JobTracker node model is optimized by Dijkstra algorithm, so that the communication between multiple JobTracker nodes and task nodes in multi-node model can be balanced. Secondly, based on the PageRank algorithm, the scheduling mode of the job is optimized. Finally, the Counting Bloom Filter algorithm is used to improve the number of tasks on each node to optimize the load of the nodes in the distributed Job Tracker model. After analyzing the communication mode of the distributed JobTracker node model and the related scheduling optimization, a small Hadoop experimental cluster is built to verify the results. It can be seen from the experimental results that the single JobTracker node model is more reliable than the distributed JobTracker node model when the cluster goes down, and the communication mode based on Dijkstra algorithm can select JobTracker nodes more quickly. For the improved job scheduling algorithm, when the submitted job is dependent, the improved algorithm based on PageRank can further improve the overall processing time of the job. For the improved load balancing algorithm, the load of the cluster is optimized from the point of view of the storage load of the replica, thus improving the utilization of the storage space of the duplicate data copy. At the end of the experiment, the comprehensive performance of the cluster is compared. It can be seen from the experimental results that the optimization under the distributed JobTracker node model is not as high as the original cluster due to the optimization and improvement of the specific jobs, and the overall performance of the processing jobs is not as high as that of the original cluster. However, when the JobTracker node goes down in the cluster, it improves the security and reliability of the cluster, and the job processing for the special scenario is of great significance.
【学位授予单位】:河北工程大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP393.09
【参考文献】
相关期刊论文 前10条
1 关国栋;滕飞;杨燕;;基于心跳超时机制的Hadoop实时容错技术[J];计算机应用;2015年10期
2 王勇;刘美林;李凯;任兴田;许荣强;;云环境下基于可靠性的均衡任务调度算法研究[J];计算机科学;2015年S1期
3 万聪;王翠荣;王聪;贾朔;;MapReduce模型中reduce阶段负载均衡分区算法研究[J];小型微型计算机系统;2015年02期
4 荀亚玲;张继福;秦啸;;MapReduce集群环境下的数据放置策略[J];软件学报;2015年08期
5 陈波;沈炜;;基于HDFS的动态副本策略设计与实现[J];工业控制计算机;2015年01期
6 郭登辉;肖先勇;;基于BF行为的多目标分布估计算法优化配置SFCL[J];计算机应用研究;2015年05期
7 翟红敏;刘国华;赵威;刘源源;翟红坤;;MapReduce中连接负载均衡优化研究[J];计算机工程与科学;2014年10期
8 顾荣;严金双;杨晓亮;袁春风;黄宜华;;Hadoop MapReduce短作业执行性能优化[J];计算机研究与发展;2014年06期
9 马莉;唐善成;王静;赵安新;;云计算环境下的动态反馈作业调度算法[J];西安交通大学学报;2014年07期
10 万兵;黄梦醒;段茜;;一种基于资源预取的Hadoop作业调度算法[J];计算机应用研究;2014年06期
相关博士学位论文 前4条
1 季长清;云计算环境下的大规模空间近邻查询算法研究[D];大连海事大学;2014年
2 顾涛;集群MapReduce环境中任务和作业调度若干关键问题的研究[D];南开大学;2014年
3 林文辉;基于Hadoop的海量网络数据处理平台的关键技术研究[D];北京邮电大学;2014年
4 李冰;云计算环境下动态资源管理关键技术研究[D];北京邮电大学;2012年
相关硕士学位论文 前10条
1 万兵;MapReduce作业调度算法优化与改进研究[D];海南大学;2014年
2 徐鹏;云计算平台作业调度算法优化研究[D];山东师范大学;2014年
3 张得震;基于Hadoop的分布式文件系统优化技术研究[D];兰州交通大学;2013年
4 谷连军;云计算环境下基于优先级与可靠度的Hadoop作业调度研究[D];湖南大学;2013年
5 车斌;基于Hadoop海量数据处理关键技术研究[D];电子科技大学;2013年
6 曹英;大数据环境下Hadoop性能优化的研究[D];大连海事大学;2013年
7 杨甫恒;基于Hadoop的大数据动态资源调节服务研究[D];成都理工大学;2013年
8 戴君;基于Hadoop的作业调度算法的研究和改进[D];武汉理工大学;2013年
9 刘冲;MapReduce作业调度算法研究[D];哈尔滨工程大学;2013年
10 任萱萱;基于Hadoop平台的作业调度研究[D];天津师范大学;2011年
,本文编号:2440828
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2440828.html