基于HDFS的华图在线文库系统数据存储与管理研究
发布时间:2018-07-17 19:12
【摘要】:作为用户共享信息的平台,文库系统为用户带来了效率和方便,然而,随着用户数据的增长,使用量的加大,文库资源的形式和种类也越来越多,成指数倍增长的海量数据资源给存储系统带来了难题,如何高效存储和管理这些数据成为急待解决的问题。 云存储技术的出现,使高效存储和管理这些海量数据成为可能。本文选取了当前非常流行的云平台Hadoop做为在线文库系统的存储平台,利用Hadoop下的云存储文件系统HDFS存储和管理在线文库系统的文档文件。HDFS只是为了解决一般性的数据存储和管理难题,简单将其应用于在线文库系统不能投入实际的应用,必须作必要的改进。在线文库系统的文档资料一般为word、pdf、txt之类的文本文件,这些类型的文件都比较小,90%以上的文档大小在32KB到20MB之间。HDFS的元数据信息是存储在元数据节点的内存中,因此在存储海量的小文件时,会导致HDFS元数据节点(NameNode)内存的过量消耗,进而降低整个HDFS系统的存储容量,因此本文提出了一种将小文件合并成大文件的存储优化方案,有效地减少了元数据节点的内存损耗。另一方面,考虑到合并后存取速度的折损,本文还提出了一种数据预取机制,该机制包括两级缓存,通过这两级缓存可以大大提升用户文件读取的流畅度,缓解云存储元数据管理节点的压力。图22幅,表3个,参考文献60篇。
[Abstract]:As a platform for users to share information, library system brings users efficiency and convenience. However, with the increase of user data and the increase of usage, the forms and types of library resources become more and more. The massive data resources which increase exponentially bring problems to storage system. How to store and manage these data efficiently becomes an urgent problem to be solved. Cloud storage technology makes it possible to store and manage these massive data efficiently. This paper selects Hadoop, a very popular cloud platform, as the storage platform of online library system. Using the cloud storage file system HDFS under Hadoop to store and manage the document files of the online library system. HDFS is just to solve the general problem of data storage and management, and simply applying it to the online library system can not be put into practical application. Necessary improvements must be made. The document material of the online library system is usually a text file such as word _ _ _. Therefore, when storing large amount of small files, it will lead to excessive consumption of memory in HDFS metadata node (name Node), and then reduce the storage capacity of the whole HDFS system. Therefore, this paper proposes a storage optimization scheme to merge small files into large files. The memory consumption of metadata nodes is reduced effectively. On the other hand, considering the loss of access speed after merging, this paper also proposes a data prefetching mechanism, which includes two levels of cache, which can greatly improve the fluency of user file reading. Ease cloud storage metadata management node pressure. There are 22 figures, 3 tables and 60 references.
【学位授予单位】:中南大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2130658
[Abstract]:As a platform for users to share information, library system brings users efficiency and convenience. However, with the increase of user data and the increase of usage, the forms and types of library resources become more and more. The massive data resources which increase exponentially bring problems to storage system. How to store and manage these data efficiently becomes an urgent problem to be solved. Cloud storage technology makes it possible to store and manage these massive data efficiently. This paper selects Hadoop, a very popular cloud platform, as the storage platform of online library system. Using the cloud storage file system HDFS under Hadoop to store and manage the document files of the online library system. HDFS is just to solve the general problem of data storage and management, and simply applying it to the online library system can not be put into practical application. Necessary improvements must be made. The document material of the online library system is usually a text file such as word _ _ _. Therefore, when storing large amount of small files, it will lead to excessive consumption of memory in HDFS metadata node (name Node), and then reduce the storage capacity of the whole HDFS system. Therefore, this paper proposes a storage optimization scheme to merge small files into large files. The memory consumption of metadata nodes is reduced effectively. On the other hand, considering the loss of access speed after merging, this paper also proposes a data prefetching mechanism, which includes two levels of cache, which can greatly improve the fluency of user file reading. Ease cloud storage metadata management node pressure. There are 22 figures, 3 tables and 60 references.
【学位授予单位】:中南大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前10条
1 孔亚楠;余跃;;云计算下的虚拟存储研究及应用[J];电脑知识与技术;2010年30期
2 钟伟彬;周梁月;潘军彪;文锦军;;云计算终端的现状和发展趋势[J];电信科学;2010年03期
3 赵吉志;李金;姚萃南;;云计算数据中心及标准化发展[J];信息技术与标准化;2011年03期
4 吴吉义;傅建庆;平玲娣;谢琪;;一种对等结构的云存储系统研究[J];电子学报;2011年05期
5 朱伟;;网络虚拟化典型技术探讨[J];广东通信技术;2011年01期
6 朝乐门;;云计算环境下的电子文件迁移模型研究[J];档案学通讯;2013年01期
7 余庆;;分布式文件系统FastDFS架构剖析[J];程序员;2010年11期
8 刘田甜;李超;胡庆成;张桂刚;;云环境下多副本管理综述[J];计算机研究与发展;2011年S3期
9 张敬亮;张军伟;张建刚;许鲁;;蓝鲸文件系统中元数据与数据隔离技术[J];计算机工程;2010年02期
10 许春聪;黄小猛;吴诺;孙宁伟;杨广文;;分布式文件系统存储介质评测与分析[J];计算机学报;2010年10期
,本文编号:2130658
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2130658.html