基于云存储的数据存储系统的设计与实现

发布时间：2018-07-05 18:44

本文选题：HDFS + 分布式文件系统　；参考：《北京邮电大学》2012年硕士论文

【摘要】：随着信息技术的迅猛发展,数据的存储占有越来越重要的地位。在数据呈指数级增长的今天,由于容量、价格和安全性的限制,本地的存储已经逐渐力不从心。这使得由分布式文件系统所支持的云存储的应用越来越广泛,其中Hadoop distributed file system(HDFS)因其强大的容错能力和可扩展性而得到了人们的广泛关注。但由于其设计是仿照Google File System(GFS),因此其只是对搜索引擎应用的支持比较好,要想应用于一般性的分布式存储,还需要一些研究和改进。在搜索引擎的应用中,文件大部分都是以大文件的形式存在的。而在一般性的存储中,文件的大小是多种多样的。而且由于HDFS单一名字节点(Namenode)的性能瓶颈问题,在文件分块太多的情况下,其对数据的访问会变得比较糟糕。因此,尽管HDFS具有很多先进的特性,但其设计的初衷就决定了其不是一个普适的分布式文件系统,而只能支持很有限的应用。本文的目的是设计和实现一个用于云存储的普适的分布式文件系统。首先提出了一种多Namenode的分布式文件系统的架构。文件的元数据由多个Namenode分布式的存储,并且名字节点只存储file到block的映射,而block的位置信息由数据节点管理者(DatanodeManager)进行存储,从而降低了名字节点的负载。然后,本文主要针对Datanode集群部分的实现方案和关键算法进行了论述。其中,本文对Datanode集群中的block分块策略进行了重新设计和实现。该策略中,数据块的大小有多个分块的因子供选择,系统根据具体的应用类型和文件的属性等信息灵活的对文件进行分块,从而保证系统能够对于云平台上的各种应用都有良好的访问性能。
[Abstract]:With the rapid development of information technology, data storage plays an increasingly important role. In today's exponential growth of data, local storage has been overwhelmed by capacity, price and security constraints. This makes the application of cloud storage supported by distributed file system more and more extensive, among which Hadoop distributed file system (HDFS) has been paid more and more attention because of its strong fault tolerance and extensibility. However, because its design is modelled on Google File system (GFS), it only supports search engine applications better. If it is to be applied to general distributed storage, it still needs some research and improvement. In the application of search engine, most files exist in the form of large files. In general storage, the size of the file is varied. Moreover, because of the performance bottleneck of HDFS single name node (Namenode), the access to data becomes worse when there are too many file blocks. Therefore, although HDFS has many advanced features, the original intention of its design determines that HDFS is not a universal distributed file system, but can only support very limited applications. The purpose of this paper is to design and implement a pervasive distributed file system for cloud storage. Firstly, a distributed file system architecture with multiple Namenode is proposed. The metadata of the file is stored by several Namenodes distributed, and the name node only stores the mapping of file to block, while the location information of block is stored by the data node manager, which reduces the load of the name node. Then, this paper mainly discusses the implementation scheme and key algorithms of the DataNode cluster. Among them, this paper redesigns and implements the block partitioning strategy in the DataNode cluster. In this strategy, the size of the data block has a number of block factors to choose, and the system can block the file flexibly according to the specific application type and file attributes, etc. In order to ensure that the system can have good access to various applications on the cloud platform.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP333

【参考文献】