当前位置:主页 > 科技论文 > 计算机论文 >

面向海量用户的云存储系统的设计与优化

发布时间:2018-11-08 18:05
【摘要】:随着信息技术不断进步、移动网络的普及,个人用户的数据量迅猛增长,对云存储服务需求度急增,众多知名企业纷纷投入到个人用户云存储服务的研发与运营中,如Google, Microsoft, Drupbox,联想,金山,华为,百度,电信运营商等。在这些优秀产品中,大多数采用的策略均是以Hadoop HDFS作为基础文件系统进行二次定制开发。开源的Hadoop HDFS以其优异的架构设计与高可扩展性、可用性、可靠性、容错性、经济性及出色的性能风靡全球,成为了热点研究领域。然而,HDFS自身尚有诸多不足尚待解决,如NameNode单点瓶颈、小文件处理能力不足、冗余文件缺乏引用、缺乏用户层负载平衡、不支持文件的断点续传、系统安全性弱、缺乏数据加密存储及共享授权机制等问题。 HDFS现存的这些不足可以从两个方面去弥补:第一,对HDFS的源码进行修改,即从内部对其进行完善;第二,在HDFS上增加一层服务层,即将部分功能剥离出来,简化HDFS本身的功能。第一种方式需要对HDFS做重大修改,工程量大、难度高、不能向后兼容HDFS版本,并且不能有效解决单点瓶颈、断点续传、文件加密授权等问题。第二种方式束缚条件少,难度低,工程量小,能兼容各种HDFS版本,更重要的是,此方式构建的云存储系统具有很大的改进空间,可以解决这些问题,具有很强的可扩展性,所以本文采用的是后一种方式。 本文着重分析了现有的HDFS架构,并在此基础上构建了一套面向海量用户的云存储系统架构,该架构为HDFS存在的诸多问题给出了优化解决方案,并能确保数据安全及用户隐私保护。本文的主要创新点如下: 1.提出了一个基于HDFS的海量用户云存储系统架构并分析了此架构的优势:有效地缓解了单点瓶颈问题、增强了系统的安全性与可扩展性、支持多种访问协议、兼容HDFS各版本等。 2.提出了一套完整的系统安全保护机制:第一,提出了一种能抵抗木马环境的客户端登录验证方法以增强用户帐户的安全性;第二,提出了一种文件的加密存储与分级授权管理办法以确保用户数据安全,并能方便文件授权的分发与回收;第三,给出了适用于云存储服务的访问控制策略,从而能够更好地保证访问安全。分析表明,该机制不仅能提升系统的安全性,而且能实现用户隐私保护及安全访问控制。再配合使用SSL/TLS进行加密通信,系统整体安全性得到极大增强。 3.给出了应用服务器间负载均衡的调度机制以实现用户层访问请求的负载均衡。应用服务器可随时加入或退出集群,避免了单点故障问题。访问请求的均衡调度与应用服务器的缓存管理相互配合,能够有效地提升系统性能及负载能力。 4.针对HDFS与海量用户特性提出了对应的优化方案。例如:增加断点续传功能,对小文件进行打包存储,对大文件进行冗余引用处理,为应用服务器增加缓存,将文件的容器结构与HDFS结构映射等。 与原HDFS系统相比,本文提出的方法在增加系统功能的同时,能够提升系统的性能及安全性。
[Abstract]:With the development of information technology, the popularity of mobile networks, the rapid increase of the data volume of individual users, the demand for cloud storage services has increased rapidly, and many well-known enterprises have input into the R & D and operation of individual user cloud storage services, such as Google, Microsoft, Drupbox, Lenovo, Kingsoft and Huawei, Baidu, telecom operator, etc. In these excellent products, most of the policies are based on Hadoop HDFS as a base file system for secondary customization development. The open source Hadoop HDFS is a hot research area with its excellent architecture design and high scalability, availability, reliability, fault tolerance, economy, and excellent performance. However, there are many problems still to be solved by the HDFS, such as the single-point bottleneck of the NameNode, the insufficient processing capacity of the small file, the lack of reference of the redundant files, the lack of the load balance of the user, the failure of the file, the weak security of the system, the lack of data encryption and the sharing of the authorization mechanism, etc. The existing shortage of HDFS can be made up of two ways: first, the source code of the HDFS is modified, that is, it is improved from inside; secondly, a layer of service layer is added on the HDFS, the part of the function is peeled off, and the HDFS is simplified Function. In the first way, it is necessary to make major changes to the HDFS. The quantity is large, the difficulty is high, the HDFS version can not be backward compatible, and the single-point bottleneck, the breakpoint continuous transmission, the file encryption authorization and the like cannot be effectively solved. The second way is that the binding condition is small, the difficulty is low, the engineering quantity is small, can be compatible with various HDFS versions, and more importantly, the cloud storage system constructed in this way has a great improvement space, can solve the problems, and has strong expandability, so that the method adopts the following In this paper, the existing HDFS architecture is analyzed, and a set of cloud storage system architecture for mass users is built on this basis. The architecture provides an optimized solution for many problems existing in the HDFS, and can ensure the data security and use. The main part of this paper is to protect the privacy of the household. The innovation point is as follows: 1. A mass user cloud storage system architecture based on HDFS is proposed and the advantages of this architecture are analyzed: the problem of single-point bottleneck is effectively relieved, the security and the expandability of the system are enhanced, a plurality of access protocols are supported, HDFS version and so on. 2. A complete system safety protection mechanism is proposed: first, a client login verification method which can resist the Trojan environment is proposed to enhance the security of the user account; secondly, a file encryption and classification authorization management is proposed The method can ensure the data safety of the user and can facilitate the distribution and recovery of the file authorization; and thirdly, the access control strategy applicable to the cloud storage service is provided, The analysis shows that the system not only can improve the security of the system but also the users Privacy protection and secure access control. Re-use SSL/ TLS for encrypted communications, systems The overall security is greatly enhanced. 3. The scheduling mechanism of the load balance among the application servers is given. the load balancing of the access request of the user layer is realized, and the application server can join or the method comprises the following steps of: quitting the cluster, and avoiding the problem of single point failure. The balancing scheduling of the access request and the cache management of the application server are mutually matched, and the method can improve system performance and load capacity effectively. 4. for HDFS and a corresponding optimization scheme is proposed for the characteristics of the mass user, for example, the breakpoint continuous transmission function is increased, the small file is packaged and stored, the large file is subjected to redundant reference processing, the cache is added to the application server, The container structure of the file and the HDFS structure mapping, etc. Compared with the original HDFS system, the method proposed in this paper is increasing the system
【学位授予单位】:华东师范大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333;TP309

【参考文献】

相关期刊论文 前10条

1 唐箭;;云存储系统的分析与应用研究[J];电脑知识与技术;2009年20期

2 许春聪;黄小猛;徐鹏志;吴诺;刘松彬;杨广文;;CarrierFS:基于虚拟内存的分布式文件系统[J];华中科技大学学报(自然科学版);2010年S1期

3 付印金;肖侬;刘芳;;重复数据删除关键技术研究进展[J];计算机研究与发展;2012年01期

4 方世昌;;国际标准ISO 7498-2第一版简介和读后感[J];计算机工程与应用;1990年07期

5 张前进;齐美彬;李莉;;基于应用层负载均衡策略的分析与研究[J];计算机工程与应用;2007年32期

6 杨德志;许鲁;张建刚;;蓝鲸分布式文件系统元数据服务[J];计算机工程;2008年07期

7 黎哲,郭成城,陈亮;一个基于TCP迁移机制的第七层负载均衡系统[J];计算机应用研究;2005年04期

8 罗拥军;李晓乐;孙如祥;;负载均衡算法综述[J];科技情报开发与经济;2008年23期

9 谢鲲;文吉刚;张大方;谢高岗;;布鲁姆过滤器查询算法[J];软件学报;2009年01期

10 谭生龙;;存储虚拟化技术的研究[J];微计算机应用;2010年01期

相关硕士学位论文 前1条

1 陈虎;基于HDFS的云存储平台的优化与实现[D];华南理工大学;2012年



本文编号:2319262

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2319262.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户9344e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com