基于Hadoop的分布式文件存储服务平台设计与实现

发布时间：2018-01-19 06:15

本文关键词： 分布式文件存储 Hadoop 冗余备份品质感知云存储　出处：《浙江大学》2012年硕士论文　论文类型：学位论文

【摘要】：随着互联网应用的飞速发展,互联网上的信息和数据量呈现爆炸性增长,如何高效、安全地组织和存储这些大规模的数据,并最大程度地降低应用成本,引发了国内外越来越多的学术界和企业界的关注。当前,无论在广义的互联网环境中,还是在中等规模企业的内部网中,抑或在小规模的局域网中,都存在着大量高性能且廉价的闲散存储资源。充分利用这些闲散、廉价的存储资源,构建可信、优质的大规模存储池,是解决上述问题的有效手段。分布式文件系统为有效利用分散存储资源提供了一条途径。然而,传统意义上的分布式文件存储系统,如Hadoop页目中的HDFS,是运行在结点性能相似、网络环境高度稳定的集群系统中的。因此,如果直接将传统的分布式文件系统部署在网络环境动态变化、存储结点自由进出的网络中,则存在空间利用率低、网络动态适应性差、存储结点信誉度低等问题。本文以Hadoop开源系统为基础,研究适用于广域网络的广义分布式文件存储服务模型,设计并实现了一个基于高效冗余备份策略及服务品质感知的分布式文件存储服务平台——QDFS。研究工：作取得如下成果： (1)将分布式文件存储系统建立在动态网络环境中,充分利用了网络环境中的廉价计算资源,降低了存储服务系统的总体拥有成本； (2)提出了一种基于恢复卷的冗余备份机制,大大减少了文件冗余信息的存储空间,并日降低了文件的维护成本； (3)建立了基于层次化名称结点的树状存储系统模型,解决了不同集群间不可共用一套分布式系统的瓶颈问题； (4)设计了一个文件存取客户端软件,解决了Hadoop客户端在Windows环境中的运行问题。
[Abstract]:With the rapid development of Internet applications, the amount of information and data on the Internet increases explosively. How to organize and store these large-scale data efficiently and safely, and reduce the application cost to the greatest extent. It has attracted more and more attention of academic and business circles at home and abroad. At present, it is not only in the broad Internet environment, but also in the intranet of medium scale enterprises, or in the small scale local area network (LAN). There are a lot of idle storage resources with high performance and low cost. Making full use of these idle and cheap storage resources and constructing credible and high quality storage pools is an effective way to solve the above problems. Distributed file systems provide a way to effectively utilize distributed storage resources. However, traditional distributed file storage systems, such as HDFS in Hadoop pages. Therefore, if the traditional distributed file system is deployed in the dynamic change of network environment directly, the nodes can be stored in and out of the network. There are some problems such as low utilization of space, poor dynamic adaptability of network, low reputation of storage node and so on. This paper is based on Hadoop open source system. A generalized distributed file storage service model for wide area networks is studied. A distributed file storage service platform based on efficient redundant backup strategy and QoS awareness is designed and implemented. 1) the distributed file storage system is built in the dynamic network environment, which makes full use of the cheap computing resources in the network environment and reduces the total cost of ownership of the storage service system. (2) A redundant backup mechanism based on restoring volume is proposed, which greatly reduces the storage space of redundant information and reduces the maintenance cost of files. (3) A tree storage system model based on hierarchical name node is established, which solves the bottleneck problem of distributed system which can not be shared among different clusters. A file access client software is designed to solve the running problem of Hadoop client in Windows environment.
【学位授予单位】：浙江大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP393.09;TP333

【参考文献】