面向HDFS的访问控制与小文件存储策略的研究与实现
[Abstract]:Hadoop is designed to store and analyze big data at the beginning of its design. The processing of big data has a great advantage. HDFS (Hadoop distributed File system) as its underlying storage medium has the advantages of low cost and suitable for dealing with large files. But the ability of HDFS access control is weak, although Hadoop can support Kerberos user authentication at present, its cost is large and flexibility is poor. In addition, HDFS has good support for large files, but its support for small files is low, a large number of small files are stored in HDFS, its metadata will occupy a lot of space on the master node, and the number of files in the whole file system will be limited. The efficiency of reading small and medium files in HDFS is not high, reading a large number of small files will affect the IO performance of the primary node. In order to improve the security of Hadoop, the encryption space is introduced, but the encryption algorithm is single, the iterative directory encryption is not supported, and the application level encryption is not provided. Therefore, this paper mainly from access control, small file merging, file encryption three aspects of research, proposed three optimizations: (1) proposed a trust value as the index of access control method, According to the user's access history, the access control ability of HDFS is improved by the feedback access control method. (2) according to the user's access history, association rules are mined, and based on frequent itemsets, the access control ability of HDFS is improved. The small files are stored in HDFS, and the two-level cache strategy is adopted to improve the reading efficiency. (3) the data can be encrypted by unpluggable way, and the data is stored in HDFS in ciphertext to improve the security of the data. The encryption and decryption policies are implemented by Map Reduce oriented and client-oriented, and Input format is customized to support Map reduction. Access control, small file merging and file encryption are implemented in the experimental cluster. The experimental results show that the proposed access control based on trust value has good performance compared with the original HDFS system. Extra time costs little. The small file merging strategy is very necessary, greatly reduces the space occupation of metadata, and has a good cache hit ratio in centralized access mode, and improves the reading efficiency. In the case of non-encrypted XOR-AESS-AES, the test of XOR-AES in the case of client oriented and Map Reduce oriented has some time overhead, which is better than that of AES algorithm, and the time cost is small in the case of Map Reduce oriented. After testing, the strategy proposed in this paper can achieve the desired results.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP333;TP309
【参考文献】
相关期刊论文 前10条
1 王全民;张程;赵小桐;雷佳伟;;一种Hadoop小文件存储优化方案[J];计算机技术与发展;2016年11期
2 陈敏;刘宁;肖树发;肖兴政;邹玲;;医疗健康大数据应用关键问题及对策研究[J];中国数字医学;2016年08期
3 王凯;贾思懿;张强;张科;董欢庆;刘振军;;基于Intel ISA-L的RS-RAID系统的研究与实现[J];计算机工程与应用;2016年15期
4 吴辉群;翁霞;王磊;倪晓薇;邹如意;陈亚兰;施李丽;蒋葵;董建成;;医学影像大数据的存储与挖掘技术研究[J];中国数字医学;2016年02期
5 王绍人;杜学绘;杨智;;面向HDFS的可证明安全的单点登录协议[J];计算机应用研究;2016年07期
6 史文浩;江国华;秦小麟;王胜;;基于用户信任值的HDFS访问控制模型研究[J];计算机科学与探索;2016年01期
7 李三淼;李龙澍;;Hadoop中处理小文件的四种方法的性能分析[J];计算机工程与应用;2016年09期
8 李铁;燕彩蓉;黄永锋;宋亚龙;;面向Hadoop分布式文件系统的小文件存取优化方法[J];计算机应用;2014年11期
9 杨彬;;分布式文件系统HDFS处理小文件的优化方案[J];软件;2014年06期
10 宋国峰;梁昌勇;;一种基于用户行为信任的云安全访问控制模型[J];中国管理科学;2013年S2期
相关博士学位论文 前1条
1 李彭军;医学影像云服务平台基础架构研究与实践[D];南方医科大学;2011年
,本文编号:2151535
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2151535.html