当前位置:主页 > 科技论文 > 软件论文 >

面向HDFS的访问控制与小文件存储策略的研究与实现

发布时间:2018-07-28 20:49
【摘要】:Hadoop设计初期是为了对大数据进行存储及分析,对大数据的处理有较大优势,HDFS(Hadoop分布式文件系统)作为其底层存储介质,具有成本低、适合处理大文件等优点。但是HDFS访问控制能力弱,虽然目前Hadoop能够支持Kerberos用户认证,但其开销大并且灵活性差。此外,HDFS对大文件具有良好的支持度,但其对小文件的支持度低,大量小文件存储在HDFS中,其元数据会占用主节点大量的空间,整个文件系统的文件数量将会受到限制,在HDFS中小文件读取效率并不高,读取大量小文件会影响主节点的IO性能。为提高安全性,Hadoop引入加密空间,但是存在加密算法单一、不支持迭代目录加密、不提供应用级加密等缺点。因此,本文主要从访问控制、小文件合并、文件加密三方面进行研究,提出三点优化:(1)提出以信任值为指标的访问控制方法,根据用户的访问历史情况以反馈式的访问控制方式提高HDFS的访问控制能力。(2)根据用户的访问历史记录,进行关联规则挖掘,在频繁项集的基础上,将小文件合并后的文件存放到HDFS中,采用两级缓存策略提高读取效率。(3)提出可拔插的方式对文件进行加密,数据以密文的方式储存在HDFS中,提高数据的安全性。以面向Map Reduce和面向客户端两种方式实现加密解密策略,自定义Input Format,使其支持Map Reduce。在实验集群中实现访问控制、小文件合并以及文件加密,用医学影像图片进行测试,实验结果表明:本文提出的基于信任值的访问控制具有良好的性能,相比于原HDFS系统,额外时间开销小。小文件合并策略是十分必要的,极大的降低了元数据的空间占用,并且在集中式访问模式下,具有良好的缓存命中率,提高了读取效率。在非加密、XOR-AES、AES三种情况下分别在面向客户端和面向Map Reduce两种情况下进行测试,XOR-AES虽然有一定的时间开销,相比于AES算法,性能更优,在面向Map Reduce的情况下时间开销小。经过测试,本文提出的策略能达到预期效果。
[Abstract]:Hadoop is designed to store and analyze big data at the beginning of its design. The processing of big data has a great advantage. HDFS (Hadoop distributed File system) as its underlying storage medium has the advantages of low cost and suitable for dealing with large files. But the ability of HDFS access control is weak, although Hadoop can support Kerberos user authentication at present, its cost is large and flexibility is poor. In addition, HDFS has good support for large files, but its support for small files is low, a large number of small files are stored in HDFS, its metadata will occupy a lot of space on the master node, and the number of files in the whole file system will be limited. The efficiency of reading small and medium files in HDFS is not high, reading a large number of small files will affect the IO performance of the primary node. In order to improve the security of Hadoop, the encryption space is introduced, but the encryption algorithm is single, the iterative directory encryption is not supported, and the application level encryption is not provided. Therefore, this paper mainly from access control, small file merging, file encryption three aspects of research, proposed three optimizations: (1) proposed a trust value as the index of access control method, According to the user's access history, the access control ability of HDFS is improved by the feedback access control method. (2) according to the user's access history, association rules are mined, and based on frequent itemsets, the access control ability of HDFS is improved. The small files are stored in HDFS, and the two-level cache strategy is adopted to improve the reading efficiency. (3) the data can be encrypted by unpluggable way, and the data is stored in HDFS in ciphertext to improve the security of the data. The encryption and decryption policies are implemented by Map Reduce oriented and client-oriented, and Input format is customized to support Map reduction. Access control, small file merging and file encryption are implemented in the experimental cluster. The experimental results show that the proposed access control based on trust value has good performance compared with the original HDFS system. Extra time costs little. The small file merging strategy is very necessary, greatly reduces the space occupation of metadata, and has a good cache hit ratio in centralized access mode, and improves the reading efficiency. In the case of non-encrypted XOR-AESS-AES, the test of XOR-AES in the case of client oriented and Map Reduce oriented has some time overhead, which is better than that of AES algorithm, and the time cost is small in the case of Map Reduce oriented. After testing, the strategy proposed in this paper can achieve the desired results.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333;TP309

【参考文献】

相关期刊论文 前10条

1 王全民;张程;赵小桐;雷佳伟;;一种Hadoop小文件存储优化方案[J];计算机技术与发展;2016年11期

2 陈敏;刘宁;肖树发;肖兴政;邹玲;;医疗健康大数据应用关键问题及对策研究[J];中国数字医学;2016年08期

3 王凯;贾思懿;张强;张科;董欢庆;刘振军;;基于Intel ISA-L的RS-RAID系统的研究与实现[J];计算机工程与应用;2016年15期

4 吴辉群;翁霞;王磊;倪晓薇;邹如意;陈亚兰;施李丽;蒋葵;董建成;;医学影像大数据的存储与挖掘技术研究[J];中国数字医学;2016年02期

5 王绍人;杜学绘;杨智;;面向HDFS的可证明安全的单点登录协议[J];计算机应用研究;2016年07期

6 史文浩;江国华;秦小麟;王胜;;基于用户信任值的HDFS访问控制模型研究[J];计算机科学与探索;2016年01期

7 李三淼;李龙澍;;Hadoop中处理小文件的四种方法的性能分析[J];计算机工程与应用;2016年09期

8 李铁;燕彩蓉;黄永锋;宋亚龙;;面向Hadoop分布式文件系统的小文件存取优化方法[J];计算机应用;2014年11期

9 杨彬;;分布式文件系统HDFS处理小文件的优化方案[J];软件;2014年06期

10 宋国峰;梁昌勇;;一种基于用户行为信任的云安全访问控制模型[J];中国管理科学;2013年S2期

相关博士学位论文 前1条

1 李彭军;医学影像云服务平台基础架构研究与实践[D];南方医科大学;2011年



本文编号:2151535

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2151535.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户0e17a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com