基于HDFS的云存储系统数据安全性研究

发布时间：2018-06-15 22:51

本文选题：Hadoop分布式文件系统 + 名字节点　；参考：《北京邮电大学》2013年硕士论文

【摘要】：HDFS (Hadoop Distributed File System)是Hadoop的分布式文件系统。Hadoop0.20.203版本中,HDFS采用主从架构,主要由一个Namenode,一个SecondaryNamenode及其众多的Datanode构成。Namenode作为HDFS的单一主服务器节点,存在单点失效,性能瓶颈,不易扩展等缺点。同时,HDFS的设计思想主要是通过一些廉价的主机和服务器构建一个分布式的文件存储集群,硬件失效是常态。针对本系统存在的问题,本文主要进行了如下几方面的工作： 1.介绍了HDFS的基本概念,对HDFS的发展历程,存在的问题,研究现状进行了综述； 2.详细介绍HDFS的系统组件,包括Namenode和Datanode。对元数据,数据的组织和交互以及数据维护方面进行了深入的研究； 3.提出了一种新的分布式Namenode节点集群方案。Namenode分布式方案主要是将原先Namenode节点的功能进行重新分配。其中Namenode1集群主要用来处理客户端的请求和管理Datanode节点的状态,；Namenode2集群主要用来管理并持久化元数据信息以及维护数据节点与数据块映射信息。Leader节点主要是转发客户端的请求及其监控整个集群运行状态,同时返回响应结果。对DRBD, Pacemaker等组件进行了深入的研究。认真分析了已经存在的系统的不足。同时,对单节点Namenode进行Linux HA的部署验证； 4.介绍了分布式系统常用的数据冗余技术。详细研究了HDFS的数据冗余机制,并且对其进行实验验证。同时,深入研究了数据冗余机制对数据交互以及负载均衡的影响。 5.总结全文,提出一些有待改进的方面。
[Abstract]:HDFS (Hadoop Distributed File System) is the.Hadoop0.20.203 version of Hadoop's distributed file system. HDFS uses a master-slave architecture, which consists of a single main server node consisting of a Namenode, a SecondaryNamenode and a large number of Datanode.Namenode. There are shortcomings such as single point failure, performance bottleneck, and uneasy expansion. The main idea of HDFS is to build a distributed file storage cluster through cheap hosts and servers, and hardware failure is normal.
In view of the problems existing in the system, this paper mainly focuses on the following aspects:
1. introduces the basic concept of HDFS, summarizes the development process, existing problems and research status of HDFS.
2. detailed introduction of HDFS's system components, including Namenode and Datanode., in-depth research on metadata, data organization and interaction, and data maintenance.
3. a new distributed Namenode node cluster scheme.Namenode distributed scheme is proposed to redistribute the functions of the original Namenode nodes. The Namenode1 cluster is mainly used to deal with the client's request and manage the state of the Datanode node, and the Namenode2 set is used to manage and persist metadata information. And the.Leader node to maintain the data node and the data block mapping information is mainly the request of the forwarding client and the monitoring of the whole cluster running state, and the response results are returned. The components such as DRBD, Pacemaker and other components are deeply studied. The shortcomings of the existing system are carefully analyzed. At the same time, the single node Namenode is carried out in the Linux HA department. Certification;
4. the data redundancy technology used in distributed systems is introduced. The data redundancy mechanism of HDFS is studied in detail, and the experimental verification is carried out. At the same time, the influence of data redundancy mechanism on data interaction and load balancing is deeply studied.
5. summarize the full text, and put forward some aspects to be improved.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333;TP309

【参考文献】