基于HDFS的分布式云存储系统的设计与实现
发布时间:2018-08-20 13:29
【摘要】:随着信息技术的飞速发展,特别是移动互联网、物联网等的发展,数据呈现出了爆发式增长,我们已经步入了海量数据的时代。传统的存储管理方式已经不再满足当前的存储现状,如何有效的存储、管理、维护这些数据已经成为了一个热点问题。 云存储技术的飞速发展,使得云存储成为了一种新型的数据存储解决方案。越来越多的开发者和企业将数据迁移到云端平台上,以降低数据管理和运维成本,并减轻海量数据的冲击,但是云存储目前还处于发展阶段,各种技术和相应的法律法规并不成熟和完善,因此存储在云端的数据并不是万无一失的,很有可能因为一些突发事件导致用户数据的丢失,或者机密数据的信息泄露,考虑到这些因素企业内部比较敏感重要的数据是不适合存放在现有的商用云存储系统之上的。 本文在综合分析了目前国内外云存储技术的发展现状,借鉴了目前最稳定、最成熟的云存储产品Amazon S3中的技术方案,考虑到企业内部现有的硬件存储设备,提出了一个具有高可扩展性、高可靠性、兼容不同存储设备的分布式云存储解决方案——基于HDFS的分布式云存储系统。该系统分为三个部分:底层数据存储部分、中间逻辑处理部分、前端访问部分,整个系统是构建在分布式文件系统HDFS之上的,充分利用了其在数据灾备、容错纠错、数据恢复方面的优秀表现,在底层存储系统之上设计实现了文件读写模块,在兼容Amazon S3协议的基础上,设计实现了面向前端请求的代理模块、核心业务逻辑处理模块、基于数据库的元数据存储模块,提供了两种服务访问方式:Web前端浏览器访问、SDK访问方式,为了确保数据请求在传输过程中的安全性和完整性,设计实现了安全控制模块,这样就构建了一个具有高可扩展、高容错、可靠、安全的分布式云存储系统,最后本文完成了整个云存储系统的分布式部署和测试。
[Abstract]:With the rapid development of information technology, especially the development of mobile Internet, Internet of things and so on, the data show explosive growth, we have entered the era of massive data. The traditional storage management method is no longer satisfied with the current storage situation. How to effectively store, manage and maintain these data has become a hot issue. With the rapid development of cloud storage technology, cloud storage has become a new data storage solution. More and more developers and enterprises migrate data to cloud platform to reduce the cost of data management and operation and reduce the impact of massive data, but cloud storage is still in the development stage. Various technologies and corresponding laws and regulations are not mature and perfect, so the data stored in the cloud is not foolproof. It is very likely that some unexpected events will lead to the loss of user data or the disclosure of confidential information. Considering these factors, the more sensitive and important data within the enterprise is not suitable to store on the existing commercial cloud storage system. In this paper, the development status of cloud storage technology at home and abroad is comprehensively analyzed, and the most stable and mature cloud storage product Amazon S3 is used for reference, taking into account the existing hardware storage devices within the enterprise. A distributed cloud storage solution with high scalability, high reliability and compatibility with different storage devices is proposed, which is a distributed cloud storage system based on HDFS. The system is divided into three parts: the underlying data storage part, the middle logic processing part, the front-end access part. The whole system is built on the distributed file system HDFS, which makes full use of the system in data disaster preparedness, fault tolerance and error correction. On the basis of compatible Amazon S3 protocol, the paper designs and implements the proxy module for front-end request and the core business logic processing module. The metadata storage module based on database provides two service access modes: Web front-end browser access SDK access mode. In order to ensure the security and integrity of data request in the process of transmission, the security control module is designed and implemented. In this way, a distributed cloud storage system with high extensibility, high fault tolerance, reliability and security is constructed. Finally, the distributed deployment and test of the whole cloud storage system are completed.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2193784
[Abstract]:With the rapid development of information technology, especially the development of mobile Internet, Internet of things and so on, the data show explosive growth, we have entered the era of massive data. The traditional storage management method is no longer satisfied with the current storage situation. How to effectively store, manage and maintain these data has become a hot issue. With the rapid development of cloud storage technology, cloud storage has become a new data storage solution. More and more developers and enterprises migrate data to cloud platform to reduce the cost of data management and operation and reduce the impact of massive data, but cloud storage is still in the development stage. Various technologies and corresponding laws and regulations are not mature and perfect, so the data stored in the cloud is not foolproof. It is very likely that some unexpected events will lead to the loss of user data or the disclosure of confidential information. Considering these factors, the more sensitive and important data within the enterprise is not suitable to store on the existing commercial cloud storage system. In this paper, the development status of cloud storage technology at home and abroad is comprehensively analyzed, and the most stable and mature cloud storage product Amazon S3 is used for reference, taking into account the existing hardware storage devices within the enterprise. A distributed cloud storage solution with high scalability, high reliability and compatibility with different storage devices is proposed, which is a distributed cloud storage system based on HDFS. The system is divided into three parts: the underlying data storage part, the middle logic processing part, the front-end access part. The whole system is built on the distributed file system HDFS, which makes full use of the system in data disaster preparedness, fault tolerance and error correction. On the basis of compatible Amazon S3 protocol, the paper designs and implements the proxy module for front-end request and the core business logic processing module. The metadata storage module based on database provides two service access modes: Web front-end browser access SDK access mode. In order to ensure the security and integrity of data request in the process of transmission, the security control module is designed and implemented. In this way, a distributed cloud storage system with high extensibility, high fault tolerance, reliability and security is constructed. Finally, the distributed deployment and test of the whole cloud storage system are completed.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前10条
1 何海钊;王虎奇;;数据库连接与访问技术的研究与应用[J];电脑知识与技术;2009年01期
2 蔡键;王树梅;;基于Google的云计算实例分析[J];电脑知识与技术;2009年25期
3 郭锋;刘建伟;;基于Socket的局域网络通信软件开发[J];电子科技;2009年05期
4 李向军;;基于云计算的数据存储系统研究[J];硅谷;2010年19期
5 张晓明;姜本臣;;一种AJAX结合CGI的嵌入式瘦Web服务器的研究[J];信息技术;2011年12期
6 董勇,周恩强,陈娟;基于Infiniband技术构建高性能分布式文件系统-Lustre[J];计算机工程与应用;2005年22期
7 曾赛峰;朱立谷;李强;张福;;企业级私有云中的虚拟化实现[J];计算机工程与应用;2010年36期
8 祝志敏;修磊;彭新光;;NFS网络文件系统技术在银行批量收付交易中的应用[J];科技情报开发与经济;2007年17期
9 吴大胜;;基于超文本标记语言的信息隐藏方法研究与实现[J];软件导刊;2011年10期
10 马俊;杨树军;;高性能集群文件系统的研究[J];计算机工程与设计;2006年13期
相关硕士学位论文 前2条
1 吕宁;基于REST架构的Atom Feed存储集群研究与设计[D];北京交通大学;2008年
2 孙杨;基于REST风格构建Web服务的研究与应用[D];电子科技大学;2009年
,本文编号:2193784
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2193784.html