当前位置:主页 > 管理论文 > 移动网络论文 >

基于HDFS的小文件处理与副本策略优化研究

发布时间:2018-04-21 11:35

  本文选题:HDFS + 小文件处理 ; 参考:《中国海洋大学》2014年硕士论文


【摘要】:作为GFS的开源实现,Hadoop Distributed File System (HDFS)在大文件的处理上表现突出,然而在处理小文件时却效率低下,主要因为海量小文件非常耗费NameNode节点的内存,从而使得单一的NameNode节点容易成为整个集群的性能瓶颈。 此外,HDFS采用静态三副本策略,以机架感知的方式确定副本的存放位置。这一策略虽然可以部分实现容错和负载均衡,但缺陷也非常明显,策略过于僵化,不仅造成较大的存储资源浪费,而且负载均衡效果也不理想。 针对HDFS处理小文件时存在的不足,本文提出了基于索引机制的小文件处理优化方案,核心思想是通过DataNode部分替代NameNode的作用,以分散小文件处理的压力,解决HDFS在大量请求下的单NameNode瓶颈问题,同时引入缓存策略,进一步优化文件读取效率。此外,为了实现均衡存储,本文提出了DataNode节点综合量化指标,并在此基础上提出了动态副本策略,实现了动态副本放置算法。归纳整个研究过程,本文主要取得了以下几点创新成果: 1、针对HDFS处理小文件效率低下的问题,本文提出了更为通用的基于索引机制的小文件处理优化方案,实现了小文件的分布式处理,,降低了NameNode节点的瓶颈效应,提升了小文件的处理效率; 2、在索引方案基础上,本文将缓存策略引入文件读取过程中,实现了分布式独立缓存,优化了HDFS的I/O操作,提高了HDFS文件读取速度; 3、针对HDFS原有的静态三副本策略导致存储效率低,存储分布不均衡的问题,本文提出了新的动态副本策略,通过多项指标综合量化DataNode节点的性能,实现了动态副本放置算法,提高了集群的均衡性和存储效率。 在测试集群上的实验结果表明,无论是基于索引机制的小文件优化方案,还是动态副本策略,相对原始的HDFS系统,在性能上均有了较大改善,相对已有优化方案也有较明显的优势。
[Abstract]:As an open source implementation of GFS, Hadoop Distributed File System (HDFSs) is outstanding in the processing of large files. However, it is inefficient in processing small files, mainly because large numbers of small files consume the memory of NameNode nodes. Thus, a single NameNode node is easy to become the performance bottleneck of the whole cluster. In addition, HDFS adopts a static three-copy strategy to determine the storage location of the replica in a rack-aware manner. Although this strategy can partially implement fault tolerance and load balancing, its shortcomings are also very obvious. The strategy is too rigid, which not only causes a large waste of storage resources, but also does not have an ideal load balancing effect. In view of the shortcomings of HDFS in dealing with small files, this paper proposes an optimization scheme of small file processing based on index mechanism. The core idea is to replace the role of NameNode partly by DataNode in order to disperse the pressure of small file processing. To solve the single NameNode bottleneck problem of HDFS under a large number of requests, a cache policy is introduced to further optimize the efficiency of file reading. In addition, in order to achieve balanced storage, this paper proposes a comprehensive quantization index of DataNode nodes, and then proposes a dynamic replica strategy to implement the dynamic replica placement algorithm. Summing up the whole research process, this paper mainly achieved the following innovative results: 1. Aiming at the problem of low efficiency in HDFS processing of small files, this paper proposes a more general optimization scheme of small file processing based on index mechanism, which realizes the distributed processing of small files and reduces the bottleneck effect of NameNode nodes. Improve the processing efficiency of small files; 2. On the basis of index scheme, this paper introduces cache policy into the process of file reading, realizes distributed independent cache, optimizes I / O operation of HDFS, and improves the speed of HDFS file reading. 3. In view of the low storage efficiency and uneven storage distribution caused by HDFS's original static three-replica strategy, this paper proposes a new dynamic replica strategy, which quantifies the performance of DataNode nodes by multiple indexes, and realizes the dynamic replica placement algorithm. The balance and storage efficiency of cluster are improved. The experimental results on the test cluster show that the performance of both the small file optimization scheme based on index mechanism and the dynamic replica strategy has been greatly improved compared with the original HDFS system. Compared with the existing optimization scheme, it also has obvious advantages.
【学位授予单位】:中国海洋大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.06

【参考文献】

相关期刊论文 前9条

1 王禹;赵跃龙;侯f ;;基于副本管理的P2P存储系统可靠性分析[J];华南理工大学学报(自然科学版);2011年02期

2 杨德志,黄华,张建刚,许鲁;大容量、高性能、高扩展能力的蓝鲸分布式文件系统[J];计算机研究与发展;2005年06期

3 侯孟书;王晓斌;卢显良;任立勇;;一种新的动态副本管理机制[J];计算机科学;2006年09期

4 陈剑;龚发根;;一种优化分布式文件系统的文件合并策略[J];计算机应用;2011年S2期

5 黄晓涛;李志永;;P2P网中基于文件分片的副本建立策略[J];计算机仿真;2008年01期

6 李晓恺;代翔;李文杰;崔U

本文编号:1782306


资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1782306.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户83f33***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com