数据密集型计算中的副本优化策略研究

发布时间：2018-05-03 01:25

本文选题：数据密集型计算 + 数据网格　；参考：《福州大学》2014年硕士论文

【摘要】：互联网的高速发展和网络宽带的普及加快了各行各业的网络化和信息化进程,同时网络数据规模的不断膨胀给计算机带来了巨大挑战。海量数据的管理能力成为了计算能力发展过程中的性能瓶颈,存储和处理网络数据的数据处理系统逐渐向数据密集型系统转变。在这样的背景下,数据密集型计算(DIC)应运而生并引起了广泛的关注。数据管理问题是数据密集型计算系统中的一个核心问题,而副本管理技术是数据管理问题中被广泛采用的一种有效技术。副本管理包括四个关键技术：副本创建、副本选择、副本替换和副本一致性维护,它在提高数据可靠性、均衡网络负载、降低数据访问延迟和带宽消耗方面都能起到很好的效果。在了解数据密集型计算环境中副本管理技术的基础上,本文重点对副本选择及替换技术进行了进一步的研究。针对已有策略的不足之处,提出了新的副本管理优化技术,主要工作包括如下两方面：(1)对于数据密集型计算环境中的副本选择,在研究已有策略的基础上,提出了一种改进的基于蚁群算法的副本选择策略。本文将蚁群算法的无限正反馈性作为一个考虑因素,对副本进行概率选择而不是绝对选择,避免了某个副本被频繁访问而最终导致网络拥塞,进而影响正在进行的其他数据传输任务。然后对主流的网格仿真器OptorSim进行扩展,将本文提出的算法在仿真器中实现,并与原算法及仿真器中自带的副本优化算法SimpleOptimiser进行仿真对比实验。(2)基于最近最久未使用(Least Recently Used, LRU)副本替换策略,提出了LRULR (Least Recently Used and Least Replicas)算法。新策略将整个数据网格的文件分布情况也做为副本替换的考虑因素,能有效提高数据密集型计算中数据副本的命中率和访问效率,减少副本复制次数及数据传输带宽消耗,其主要思想是当存储容量不足时替换最近最久未使用副本集中全局数量最少的副本。然后在OptorSim中实现新策略,并将其与LRU算法进行对比试验。本文分别对数据密集型计算的副本选择和替换问题提出了优化策略,并在仿真平台上与原策略进行对比实验。OptorSim上的实验表明,本文提出的算法在减少平均作业时间、降低网络带宽消耗和平衡网络负载方面都具有一定的优越性。
[Abstract]:The rapid development of the Internet and the popularization of network broadband accelerate the network and information process of all walks of life. At the same time, the continuous expansion of the network data scale has brought great challenges to the computer. The management ability of mass data has become the performance bottleneck in the process of computing power development, and the data processing system for storage and processing of network data is made. In this context, data intensive computing (DIC) came into being and attracted wide attention. Data management is a core problem in the data intensive computing system, and replica management technology is an effective technique used widely in data management. There are four key technologies: copy creation, copy selection, copy replacement and copy consistency maintenance. It can improve data reliability, balance network load, reduce data access delay and bandwidth consumption. On the basis of replica management technology in data intensive computing environment, this paper focuses on replicas. The selection and replacement technology is further studied. In view of the shortcomings of the existing strategies, a new copy management optimization technology is proposed. The main work includes the following two aspects: (1) the copy selection in the data intensive computing environment, and on the basis of the existing strategies, an improved copy based on ant colony algorithm is proposed. In this paper, in this paper, the infinite positive feedback of ant colony algorithm is considered as a consideration factor, and the copy is chosen instead of absolute choice. It avoids the frequent access of a copy and eventually leads to the network congestion, and then affects the other data transmission tasks being carried out. Then, the mainstream grid emulator OptorSim is extended. The algorithm proposed in this paper is implemented in the emulator, and the simulation contrast experiment with the original algorithm and the copy optimization algorithm SimpleOptimiser in the emulator is simulated. (2) based on the most recent Least Recently Used (LRU) copy replacement strategy, the LRULR (Least Recently Used and Least Replicas) algorithm is proposed. The new strategy will be the whole The file distribution of data grid is also considered as a factor of replica substitution. It can effectively improve the hit rate and access efficiency of data copies in the data intensive computing, reduce the number of replicas and reduce the consumption of data transmission bandwidth. The main idea is to replace the most recent unused copy centralized global number when the storage capacity is insufficient. A small copy. Then a new strategy is implemented in OptorSim and compared with the LRU algorithm. This paper presents an optimization strategy for the copy selection and replacement of data intensive computing, and a comparison experiment on the simulation platform with the original strategy on.OptorSim shows that the algorithm proposed in this paper reduces the average work. Time has advantages in reducing network bandwidth consumption and balancing network load.

【学位授予单位】：福州大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.09;TP18

【相似文献】