基于HDFS的云存储平台的优化与实现

发布时间：2018-10-11 12:21

【摘要】：云计算是当前研究的热门课题，云存储作为云计算的衍生，也成为当前国内外最为热门的研究领域。其中，Hadoop文件系统HDFS作为Google File System的开源实现，，成为业界研究云计算和云存储、实现云应用和云服务参考的标准模型。然而，现有HDFS架构却有着一些不足，典型的包括对小文件支持的不足，以及单一NameNode容易成为整个集群性能瓶颈等问题。本文在研究现有HDFS的基础上，给出了相应的解决方案，对于小文件问题，本文提出了一种引入用户元数据空间的方式来将HDFS中的小文件存储合并为大文件存储；对于HDFS单一NameNode性能瓶颈问题，本文提出了一种基于MongoDB的多NameNode解决方案。实验结果表明，本文提出的方案，不仅拓展了HDFS集群的命名空间，而且提高了HDFS的并发读写速度。除了对HDFS现有架构进行了相关优化，本文还在现有HDFS架构的基础上，架设了一个云存储系统，实现了文件的上传、下载、共享、浏览等功能。同时，该系统还可以对当前HDFS集群进行监控，监控信息包括集群容量信息、集群块信息，单个节点的负载信息、CPU使用信息等。云存储系统的实现，对基于HDFS的相关应用具有探索和指导意义。
[Abstract]:Cloud computing is a hot topic in current research. Cloud storage, as a derivative of cloud computing, has become the most popular research field at home and abroad. Among them, Hadoop file system HDFS, as an open source implementation of Google File System, has become the standard model for cloud computing, cloud storage, cloud application and cloud service reference. However, there are some shortcomings in the existing HDFS architecture, such as the lack of support for small files and the fact that a single NameNode can easily become a bottleneck in the performance of the entire cluster. Based on the research of the existing HDFS, this paper gives the corresponding solution. For the small file problem, this paper proposes a way of introducing user metadata space to merge the small file storage in HDFS into large file storage. For the performance bottleneck of HDFS single NameNode, this paper proposes a multi-NameNode solution based on MongoDB. The experimental results show that the proposed scheme not only extends the namespace of HDFS cluster, but also improves the speed of concurrent reading and writing of HDFS. In addition to the existing HDFS architecture optimization, this paper also based on the existing HDFS architecture, set up a cloud storage system, to achieve file upload, download, share, browse and other functions. At the same time, the system can monitor the current HDFS cluster. The monitoring information includes cluster capacity information, cluster block information, load information of single node, CPU usage information and so on. The implementation of cloud storage system has exploration and guidance significance for related applications based on HDFS.
【学位授予单位】：华南理工大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP333

【引证文献】