HDFS平台上以能效为考量的小文件合并
发布时间:2018-10-18 09:40
【摘要】:为了解决Hadoop分布式文件系统(HDFS)平台上小文件的存在带来MapReduce程序运行能耗成本偏高问题,建立Hadoop节点集群的能耗模型进行分析推导,证明了在Hadoop平台上,存在能使程序运行能耗成本最低的最优文件大小,并在此基础上结合经济学边际分析理论提出一种基于能耗成本和访问成本考虑的最优文件大小判定策略.此策略可以对存放在HDFS上的小文件合并进行效益计算,将小文件合并为成本最优文件大小以获得最佳收益.通过实验证明了能效最优数据块大小的存在,并证明了成本和效益相结合利用边际分析理论来确定数据块大小的合理性和有效性.
[Abstract]:In order to solve the problem that the existence of small files on the (HDFS) platform of Hadoop distributed file system leads to the high running energy cost of MapReduce program, the energy consumption model of Hadoop node cluster is established to analyze and deduce, which is proved on Hadoop platform. There is an optimal file size which can minimize the cost of running the program, and based on this, a decision strategy of optimal file size based on energy cost and access cost is proposed based on the economic marginal analysis theory. This strategy can calculate the benefits of merging small files stored on HDFS, and merge small files into the cost optimal file size to obtain the best income. The existence of the optimal data block size for energy efficiency is proved by experiments, and the rationality and effectiveness of using the marginal analysis theory to determine the size of the data block are proved by the combination of cost and benefit.
【作者单位】: 中南大学软件学院;河南大学软件学院;北京信息科技大学计算机学院;
【基金】:国家自然科学基金项目(61272148;61301136) 高等学校博士学科点专项科研基金项目(20120162110061;20120162120091)
【分类号】:TP333
,
本文编号:2278736
[Abstract]:In order to solve the problem that the existence of small files on the (HDFS) platform of Hadoop distributed file system leads to the high running energy cost of MapReduce program, the energy consumption model of Hadoop node cluster is established to analyze and deduce, which is proved on Hadoop platform. There is an optimal file size which can minimize the cost of running the program, and based on this, a decision strategy of optimal file size based on energy cost and access cost is proposed based on the economic marginal analysis theory. This strategy can calculate the benefits of merging small files stored on HDFS, and merge small files into the cost optimal file size to obtain the best income. The existence of the optimal data block size for energy efficiency is proved by experiments, and the rationality and effectiveness of using the marginal analysis theory to determine the size of the data block are proved by the combination of cost and benefit.
【作者单位】: 中南大学软件学院;河南大学软件学院;北京信息科技大学计算机学院;
【基金】:国家自然科学基金项目(61272148;61301136) 高等学校博士学科点专项科研基金项目(20120162110061;20120162120091)
【分类号】:TP333
,
本文编号:2278736
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2278736.html