基于Hadoop的文件同步存储系统的设计与实现
发布时间:2019-03-06 13:17
【摘要】:云计算时代,随着网络终端设备的广泛使用以及互联网技术的进一步普及,数据存储与备份技术已经与个人生活及组织的运作息息相关,企业与个人均面临着海量数据的管理难题。云存储及其相关技术的发展给数据存储领域带来了革新。基于云存储的在线存储系统能够向用户提供永久的,存储空间可扩展的,便捷的,价格低廉的数据存储与备份服务。当前国内比较成熟的存储服务产品有金山快盘、华为网盘等。它们都提供了稳定的数据存储、文件同步功能,但也存在一些问题。首先,客户端提供的文件系统监控功能不够完善;其次,文件的数据同步效率在某些情况下较低;此外,有些产品没有提供数据的安全传输功能,也没有提供对多种同步事件的分类数据传输功能;最后,现有产品尚未提供客户端与服务器数据的加密存储功能。支撑数据存储的云存储平台的优化也是提供基于云存储的数据同步存储服务厂商应该努力解决的问题。 本文从在线同步存储服务使用者的角度出发,总结了当前同步存储服务产品的主要功能以及存在的一些问题,从需求与问题出发,深入研究了实现基于云存储的文件同步存储系统的关键技术,设计并实现了一种基于hadoop搭建的云存储后台,使用了Rsync同步算法的文件同步存储系统。论文的主要工作包括:分析国内外同类产品的优缺点,明确系统用户的需求;利用开源的jpathwatch类库实时监控系统客户端虚拟磁盘的更新变化,实现了不同类型同步事件的实时触发和通知功能,,添加了对文件移动和文件重命名的监控;通过对同步事件的分类,实现了不同事件的分类化处理,特别是文件内容更新和续传事件,设计了一种基于Rsync算法的同步协议来减少通信双方的数据传输量,改进了同步效率;针对不同的同步任务,设计了最佳的数据传输方式,使用HTTPS实现数据的加密传输;使用了基于Hadoop的云存储后台存储数据。 本文采用分层模块化的方法对系统进行设计与实现,并且在论文的最后两个章节对系统的功能模块进行了测试与分析,总结了研究成果和系统的可扩展功能,最后展望了下一步工作。
[Abstract]:In the age of cloud computing, with the widespread use of network terminal devices and the further popularization of Internet technology, data storage and backup technology has become closely related to individual life and the operation of organizations. Enterprises and individuals are faced with the management problems of massive data. The development of cloud storage and related technologies has brought innovation to the field of data storage. Cloud-based online storage system can provide users with permanent, scalable, convenient and inexpensive data storage and backup services. At present, the more mature domestic storage service products are Jinshan Express, Huawei Netdisk and so on. They all provide stable data storage and file synchronization, but there are also some problems. Firstly, the monitoring function of file system provided by client is not perfect, secondly, the efficiency of file data synchronization is low in some cases. In addition, some products do not provide secure transmission of data or classified data transfer for multiple synchronization events; finally, the existing products do not provide encrypted storage of client and server data. The optimization of cloud storage platform that supports data storage is also a problem that vendors should strive to solve to provide data synchronization storage services based on cloud storage. From the point of view of online synchronous storage service consumer, this paper summarizes the main functions and existing problems of current synchronous storage service products, and starts from the requirements and problems. The key technology of file synchronization storage system based on cloud storage is studied deeply. A cloud storage background based on hadoop is designed and implemented. A file synchronization storage system based on Rsync synchronization algorithm is designed and implemented. The main work of this paper includes: analyzing the advantages and disadvantages of the same kind of products at home and abroad, clarifying the needs of the system users; Using the open source jpathwatch class library to monitor the change of virtual disk in the client, the real-time trigger and notification function of different kinds of synchronous events is realized, and the monitoring of file movement and file renaming is added. Through the classification of synchronous events, the classification of different events, especially the update of file contents and the continuation of events, is realized. A synchronization protocol based on Rsync algorithm is designed to reduce the amount of data transmission between communication parties and improve the synchronization efficiency. According to different synchronization tasks, the optimal data transmission mode is designed, the encrypted data transmission is realized by using HTTPS, and the Hadoop-based cloud storage is used to store the data in the background. In this paper, the hierarchical modularization method is used to design and implement the system, and in the last two chapters of the paper, the functional modules of the system are tested and analyzed, and the research results and the extensible functions of the system are summarized. Finally, the future work is prospected.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP333
本文编号:2435552
[Abstract]:In the age of cloud computing, with the widespread use of network terminal devices and the further popularization of Internet technology, data storage and backup technology has become closely related to individual life and the operation of organizations. Enterprises and individuals are faced with the management problems of massive data. The development of cloud storage and related technologies has brought innovation to the field of data storage. Cloud-based online storage system can provide users with permanent, scalable, convenient and inexpensive data storage and backup services. At present, the more mature domestic storage service products are Jinshan Express, Huawei Netdisk and so on. They all provide stable data storage and file synchronization, but there are also some problems. Firstly, the monitoring function of file system provided by client is not perfect, secondly, the efficiency of file data synchronization is low in some cases. In addition, some products do not provide secure transmission of data or classified data transfer for multiple synchronization events; finally, the existing products do not provide encrypted storage of client and server data. The optimization of cloud storage platform that supports data storage is also a problem that vendors should strive to solve to provide data synchronization storage services based on cloud storage. From the point of view of online synchronous storage service consumer, this paper summarizes the main functions and existing problems of current synchronous storage service products, and starts from the requirements and problems. The key technology of file synchronization storage system based on cloud storage is studied deeply. A cloud storage background based on hadoop is designed and implemented. A file synchronization storage system based on Rsync synchronization algorithm is designed and implemented. The main work of this paper includes: analyzing the advantages and disadvantages of the same kind of products at home and abroad, clarifying the needs of the system users; Using the open source jpathwatch class library to monitor the change of virtual disk in the client, the real-time trigger and notification function of different kinds of synchronous events is realized, and the monitoring of file movement and file renaming is added. Through the classification of synchronous events, the classification of different events, especially the update of file contents and the continuation of events, is realized. A synchronization protocol based on Rsync algorithm is designed to reduce the amount of data transmission between communication parties and improve the synchronization efficiency. According to different synchronization tasks, the optimal data transmission mode is designed, the encrypted data transmission is realized by using HTTPS, and the Hadoop-based cloud storage is used to store the data in the background. In this paper, the hierarchical modularization method is used to design and implement the system, and in the last two chapters of the paper, the functional modules of the system are tested and analyzed, and the research results and the extensible functions of the system are summarized. Finally, the future work is prospected.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP333
【参考文献】
相关期刊论文 前9条
1 杨亚平,李伟琴;基于SSL的数据安全传输系统的设计与实现[J];北京航空航天大学学报;2001年04期
2 邓波涛;;基于Java的系统网络编程研究[J];电脑知识与技术;2011年15期
3 林雪云;利用SSL实现数据传输安全[J];福建电脑;2005年10期
4 魏兴国;;HTTP和HTTPS协议安全性分析[J];程序员;2007年07期
5 赵斌,刘长起,戴英侠;Windows操作系统的文件操作监控技术[J];计算机工程与应用;2004年31期
6 刘贝;汤斌;;云存储原理及发展趋势[J];科技信息;2011年05期
7 孟彦;侯整风;;基于SSL/TLS的安全文件传输系统[J];计算机技术与发展;2006年05期
8 谷庆华;李成贵;;Java多线程技术在网络通信系统中的应用[J];西安外事学院学报;2007年04期
9 周可;王桦;李春花;;云存储技术及其应用[J];中兴通讯技术;2010年04期
相关硕士学位论文 前1条
1 李贞;基于Rsync算法的远程文件同步系统的设计与实现[D];北京邮电大学;2010年
本文编号:2435552
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2435552.html