当前位置:主页 > 科技论文 > 计算机论文 >

Hadoop平台存储策略的研究与优化

发布时间:2018-04-02 16:33

  本文选题:云计算 切入点:Hadoop 出处:《北京交通大学》2012年硕士论文


【摘要】:随着经济、社会以及科学技术的发展,数字信息正在经历爆炸式的增长。信息化和互联网的发展以及廉价的存储设备的出现,为海量信息存储提供了动力和物理基础。数据量比较小的时候,存储和备份数据比较简单,随着数据量达到TB甚至PB级别,存储和备份如此庞大的数据成为一个棘手的问题,而且人们对数据的存储效率和安全性的要求也在不断的提高。如何高效的存储和读取数据成为人们关注的重点,云计算是目前比较成熟的方案,是对数据存储和数据安全的一个有效解决办法,能够提高数据的安全性和存储速度。Hadoop是云计算里面比较流行的框架,具体高可靠性、高效性、高扩展性和高容错性的优势。而且它是开源框架,非常适合科研和应用,所以本文选择Hadoop框架作为云计算的研究对象。 基于如何高效存储海量数据的问题,本文在分析Hadoop的HDFS(Hadoop Distributed File System)原理和存储策略基础上,结合实际应用Hadoop平台遇到的问题,分析其HDFS文件系统数据存储策略的局限和不足,提出了HDFS分布式文件系统的优化存储策略DIFT(Dstat Iostat Free Top)。DIFT存储策略利用更完善的数据节点的状态信息作为策略依据,能够提高集群的磁盘和网络带宽的利用率,减少瓶颈出现的可能性,提高了系统性能,使集群具有更好的负载均衡和用户体验。 本文主要研究内容是:首先对Hadoop的HDFS模型的原理研究和分析,从控制节点、数据节点、文件块的数据结构以及接口、类、方法的调用关系方面详细分析和研究,分析HDFS的运行原理和功能的实现方法。其次从数据结构、状态信息、心跳协议等方面研究和设计DIFT存储策略的实现,最后编译含有DIFT存储策略的Hadoop代码,把DIFT存储策略应用到Hadoop集群上,实验验证和测试策略的效果。DIFT存储策略具有可配置的特性,设计时充分考虑用户实际情况的特殊性,用户可以根据自己实际需求设置符合实际应用的策略配置。实验证明,DIFT存储策略提高了Hadoop的HDFS分布式文件系统的存储效率,使得平台能够高效的处理海量数据的存储。 HDFS分布式文件系统运行在廉价的机器搭建稳定的Hadoop云平台之上,同时配置高效的DIFT存储策略,可以很好的满足实际应用的需求,完全可以作为企业和学校的数据中心的平台。同时具有可配置的存储策略的优化,直接配置符合实际应用的策略和阈值即可,减少了企业和学校开发的周期。
[Abstract]:With the development of economy, society and science and technology, digital information is experiencing explosive growth.The development of information and Internet and the emergence of cheap storage devices provide the power and physical basis for mass information storage.When the amount of data is small, it is easier to store and back up data. With the amount of data reaching TB or PB level, storing and backing up such huge data becomes a thorny problem.Moreover, the requirements of data storage efficiency and security are also increasing.How to store and read data efficiently has become the focus of attention. Cloud computing is a mature solution, which is an effective solution to data storage and data security.Hadoop, which can improve data security and storage speed, is a popular framework in cloud computing, which has the advantages of high reliability, high efficiency, high scalability and high fault tolerance.And it is open source framework, very suitable for scientific research and application, so this paper chooses Hadoop framework as the research object of cloud computing.Based on the problem of how to store mass data efficiently, based on the analysis of Hadoop's HDFS(Hadoop Distributed File system principle and storage strategy, combined with the problems encountered in the practical application of Hadoop platform, this paper analyzes the limitations and shortcomings of its HDFS file system data storage strategy.This paper proposes an optimized storage strategy for HDFS distributed file system, DIFT(Dstat Iostat Free Top).DIFT storage strategy, which can improve the utilization of disk and network bandwidth of cluster by using the state information of more perfect data nodes as the policy basis.It reduces the possibility of bottleneck, improves system performance, and makes cluster have better load balance and user experience.The main contents of this paper are as follows: firstly, the principle of Hadoop's HDFS model is studied and analyzed in detail from the aspects of control node, data node, file block data structure, interface, class and method.The operation principle and function realization method of HDFS are analyzed.Secondly, the implementation of DIFT storage strategy is studied and designed from the aspects of data structure, state information, heartbeat protocol and so on. Finally, the Hadoop code with DIFT storage strategy is compiled, and the DIFT storage strategy is applied to Hadoop cluster.The effect of the experimental verification and test strategy .DIFT storage policy has configurable characteristics, the design fully takes into account the particularity of the user's actual situation, the user can set up the policy configuration according to their actual needs according to the actual application.Experimental results show that the DIFT storage strategy improves the storage efficiency of Hadoop's HDFS distributed file system and enables the platform to efficiently process the storage of massive data.HDFS distributed file system runs on cheap machines to build stable Hadoop cloud platform, and configure efficient DIFT storage strategy, which can meet the needs of practical applications, and can be used as a data center platform for enterprises and schools.At the same time, with the optimization of configurable storage strategy, the direct configuration can meet the practical application strategy and threshold value, which reduces the cycle of enterprise and school development.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP333

【引证文献】

相关硕士学位论文 前2条

1 董其文;基于HDFS的小文件存储方法的研究[D];大连海事大学;2013年

2 杨浩;云GIS空间数据存储管理和共享研究[D];中国地质大学(北京);2013年



本文编号:1701224

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1701224.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e7b4a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com