Hadoop集群系统性能优化的研究
发布时间:2019-03-17 18:20
【摘要】:云计算在商业和科学研究上的价值已渐渐被社会认可。它可以在搜索引擎、互联网应用技术、大规模数据计算等方面发挥出巨大的能量。Hadoop技术作为云计算技术的开源实现,对云计算技术的发展起到了十分重要的作用。现在大多数的企业和科学研究采用了Hadoop作为云计算平台。Hadoop凭借它简单的并行编程模型,庞大的数据存储能力和高效的计算能力为用户提供了良好的客户体验。但是,由于Hadoop的发展时间比较短暂,系统中仍然有很多地方可以去完善和改进,才能更加充分地发挥其系统性能。因此对Hadoop系统性能的研究工作是必要并有意义的。 系统性能参数和任务级调度算法对Hadoop系统工作性能起着重要的影响,其中系统性能参数关系到集群工作各阶段对系统资源的使用情况;任务级调度算法是Hadoop工作时任务分配的关键。参数值的确定与任务分配没有统一的模型,是比较复杂的工作,目前对它们的研究还处于发展阶段。因而我们从这两方面对Hadoop系统性能的优化进行了研究。 本文着重对集群节点的执行能力进行了分析与研究。为使Hadoop集群系统能够应对多变的任务及集群节点自身的差异对系统工作性能带来的影响,,设计TaskConfigure服务器及构建了Hadoop集群参数信息系统对集群参数进行自动调优;并针对当前Hadoop集群默认运行的任务级调度算法可能存在的负载分布不均的状况,提出了一种基于节点能力的任务自适应分配方法。其中,参数信息系统的实现,采用节点资源利用效率生成集群系统参数的优化配置值,再按节点和任务的分类为各类分配不同的配置参数值,这样可保证节点在恰当的配置参数下执行任务;同时,为了集群在执行任务时各工作节点能够保持负载相对均衡,以节点性能、任务特征、节点失效率等计算节点权值比例参数作为节点任务量调度分配的依据,并判断节点自身的负载状态,根据负载状态值自适应地调整运行的任务量。通过实验表明,集群总的任务完成时间明显地缩减,各节点的负载更加均衡,节点资源的利用更为合理,并且使集群具有良好的稳定性和扩展性。
[Abstract]:The value of cloud computing in business and scientific research has gradually been recognized by society. Hadoop technology, as the open source implementation of cloud computing technology, plays a very important role in the development of cloud computing technology. Now most enterprises and scientific research have adopted Hadoop as the cloud computing platform. Hadoop has provided a good customer experience for users with its simple parallel programming model, huge data storage capacity and efficient computing power. However, because the development time of Hadoop is relatively short, there are still many places in the system that can be improved and improved in order to give full play to its system performance. Therefore, it is necessary and meaningful to study the performance of Hadoop system. System performance parameters and task-level scheduling algorithms play an important role in the performance of Hadoop system, in which the system performance parameters are related to the use of system resources in each stage of cluster work. Task-level scheduling algorithm is the key to task assignment in Hadoop. There is no unified model for the determination of parameter values and assignment of tasks, which is a complex task, and the research on them is still in the stage of development. Therefore, we studied the performance optimization of Hadoop system from these two aspects. This paper focuses on the cluster node execution capacity analysis and research. In order to enable the Hadoop cluster system to cope with the changeable tasks and the impact of the cluster nodes' own differences on the performance of the system, the TaskConfigure server is designed and the Hadoop cluster parameter information system is constructed to optimize the cluster parameters automatically. In order to solve the problem of uneven load distribution in the current task-level scheduling algorithms running by default in Hadoop clusters, an adaptive task allocation method based on node capability is proposed. Among them, the implementation of parameter information system, using node resource utilization efficiency to generate the cluster system parameters of the optimal configuration value, and then according to the classification of nodes and tasks for the allocation of different configuration parameters, This ensures that the node can perform the task under the appropriate configuration parameters. At the same time, in order to keep the load balance among the nodes in the cluster, the weight ratio parameters such as node performance, task characteristics, node failure rate and so on are used as the basis of node task scheduling and assignment. The load state of the node itself is judged and the task quantity is adjusted adaptively according to the load state value. The experimental results show that the total task completion time of the cluster is significantly reduced, the load of each node is more balanced, the utilization of node resources is more reasonable, and the cluster has good stability and expansibility.
【学位授予单位】:辽宁师范大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.5
[Abstract]:The value of cloud computing in business and scientific research has gradually been recognized by society. Hadoop technology, as the open source implementation of cloud computing technology, plays a very important role in the development of cloud computing technology. Now most enterprises and scientific research have adopted Hadoop as the cloud computing platform. Hadoop has provided a good customer experience for users with its simple parallel programming model, huge data storage capacity and efficient computing power. However, because the development time of Hadoop is relatively short, there are still many places in the system that can be improved and improved in order to give full play to its system performance. Therefore, it is necessary and meaningful to study the performance of Hadoop system. System performance parameters and task-level scheduling algorithms play an important role in the performance of Hadoop system, in which the system performance parameters are related to the use of system resources in each stage of cluster work. Task-level scheduling algorithm is the key to task assignment in Hadoop. There is no unified model for the determination of parameter values and assignment of tasks, which is a complex task, and the research on them is still in the stage of development. Therefore, we studied the performance optimization of Hadoop system from these two aspects. This paper focuses on the cluster node execution capacity analysis and research. In order to enable the Hadoop cluster system to cope with the changeable tasks and the impact of the cluster nodes' own differences on the performance of the system, the TaskConfigure server is designed and the Hadoop cluster parameter information system is constructed to optimize the cluster parameters automatically. In order to solve the problem of uneven load distribution in the current task-level scheduling algorithms running by default in Hadoop clusters, an adaptive task allocation method based on node capability is proposed. Among them, the implementation of parameter information system, using node resource utilization efficiency to generate the cluster system parameters of the optimal configuration value, and then according to the classification of nodes and tasks for the allocation of different configuration parameters, This ensures that the node can perform the task under the appropriate configuration parameters. At the same time, in order to keep the load balance among the nodes in the cluster, the weight ratio parameters such as node performance, task characteristics, node failure rate and so on are used as the basis of node task scheduling and assignment. The load state of the node itself is judged and the task quantity is adjusted adaptively according to the load state value. The experimental results show that the total task completion time of the cluster is significantly reduced, the load of each node is more balanced, the utilization of node resources is more reasonable, and the cluster has good stability and expansibility.
【学位授予单位】:辽宁师范大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.5
【参考文献】
相关期刊论文 前6条
1 辛大欣;刘飞;;Hadoop集群性能优化技术研究[J];电脑知识与技术;2011年22期
2 林伟伟;;一种改进的Hadoop数据放置策略[J];华南理工大学学报(自然科学版);2012年01期
3 黄
本文编号:2442568
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2442568.html