分布式云计算资源配置技术研究

发布时间：2018-08-05 17:00

【摘要】：采用令用户满意的高效资源配置机制是云服务提供商取得成功的关键因素之一。在基础设施即服务(Infrastructure act as Service,Iaa S)模型中,云服务提供商主要以虚拟机的形式对外提供服务,因此资源配置问题也就归结为在满足业务需求前提下的虚拟机部署问题。本文首先依据虚拟机部署顺序,采用自顶向下的方法,将分布式云计算中虚拟机部署分为三个阶段,即云网络选择阶段,数据中心选择阶段和服务器选择阶段。然后结合各阶段中具体业务的不同需求,给出了针对性的资源配置策略。在云网络选择阶段,云经纪人(cloud broker)会利用长期资源预留的价格优势和资源统计复用带来的增益,通过租赁公有云服务提供商的基础设施来降低自己的成本。由于云服务提供商会提供多种不同的虚拟机预留实例周期(virtual machines reserved instance terms),而不同周期的实例价格不同,云经纪人必须从多种实例周期中做出适当选择,以便采用最低成本资源满足用户的实时动态需求。针对此目标,本文提出了最长实例周期优选的逐层预留启发式算法和基于集合覆盖的近似算法来解决离线情况下的资源配置问题,并提出了基于历史资源使用信息的在线资源配置算法。实验表明启发式离线算法的运行速度几乎是近似离线算法的两倍,相比全部使用实时虚拟机的方法,两种离线算法均能节省大约27%的成本,在线算法能节省大约14%的成本。与已有的只考虑一种预留实例的算法相比,本文提出的算法更具有实用性。在数据中心选择阶段,本文研究了虚拟机部署时基于聚类的数据中心选择和大数据向云端迁移时数据中心选择两个问题。第一个问题的目标是优化数据中心间通信时延和带宽。为某大型任务或者某组织服务的一组相互通信的虚拟机可能会跨越多个数据中心,所以需要极小化最大的数据中心间的距离,以便极小化通信时延,同时节省昂贵的数据中心间的长途带宽。本文首先提出了基于密度聚类的2-近似数据中心选择算法。相比已有的只考虑数据中心间距离的方法,该算法能充分利用数据中心的组网拓扑属性、稠密属性以及容量信息,效率提升了约15%～72%。然后建立了虚拟机半通信模型,并提出了基于该模型的虚拟机划分算法将虚拟机分组并对应到已选定的数据中心。实验表明,该算法不仅能进一步减少数据中心间的带宽消耗,而且运行速度比已有算法快2倍左右。所有算法均适用于虚拟机同构或异构的场景,克服了已有算法只适用于虚拟机同构场景的局限性。第二个问题的目标是为分布式大数据选择合适的数据中心,将数据迁移到云端,既可保证快速的本地数据接入又能实现低成本数据迁移和处理。本文首先分析了大数据迁移时的四种目标:即公平数据放置、优选数据放置、传输成本最小化数据放置和总成本最小化数据放置。然后采用二分图对问题进行了建模。针对前两种目标,提出了一种基于组合优化的紧的3-近似算法。针对后两种目标,提出了一种最近数据中心优先的启发式算法。所给两种算法能够降低接入时延和成本,解决因法规限制或者用户偏好引起的部分数据中心不可用的问题,弥补了已有方法默认全部数据中心均可用的不足。在服务器选择阶段,需要实现服务质量可感知的低成本服务器选择。服务器和组网占了数据中心总成本的60%左右。如何有效部署虚拟机以尽可能地节省成本,并保证服务质量,对于提高云服务提供商的竞争力起着举足轻重的作用。考虑异构服务器以及虚拟机多种资源需求的随机性,该问题被形式化为一个多目标非线性规划。通过利用数据中心的组网拓扑信息,具有更大通信量的虚拟机被尽可能地部署在一起,减少了通信时延并节省了服务器之间带宽消耗。同时,统计复用和新定义的“相似性”方法被用来整合虚拟机,资源容量违约被尽可能地限定在指定的极小概率内。这样的部署方案既不会影响服务质量,又可以节省服务器和网络成本。本文分别针对离线和在线场景给出了启发式算法。与已有算法比较,实验显示本文所给算法能更多地节省包括服务器,带宽在内的多种资源成本,而且运行更快。本文最后考虑了跨数据中心大数据处理时数据中心和服务器的协同选择问题。考虑到将跨全球分布的数据全部迁移到同一个数据中心进行分析不一定可行,而已广为接受的大数据分析框架Hadoop却只能处理数据中心内部的数据,因此有必要研究跨数据中心的大数据处理机制。本文提出了一种新的跨数据中心大数据处理架构和一种基于键值(key value)的解决方案,该方案能尽量遵守传统Hadoop的数据本地性原则,并能以更低的成本实现跨数据中心大数据处理。方案采用二层规划对问题进行建模,用定制的二层分组遗传算法求解。实验验证了该方案的有效性,相比采用传统Hadoop思想的方案和已有的最新方案,本文所提方案能分别节省成本大约49%和40%。
[Abstract]:It is one of the key factors for the success of the cloud service provider to adopt the efficient resource allocation mechanism which makes the user satisfied. In the Infrastructure act as Service (Iaa S) model, the cloud service provider provides the service mainly in the form of virtual machine, so the problem of resource allocation is also attributed to meeting the business needs. Under the premise of the virtual machine deployment problem, this paper, based on the virtual machine deployment order, uses the top-down method to divide the virtual machine deployment in distributed cloud computing into three stages, namely, the cloud network selection stage, the data center selection stage and the server selection stage, and then combines the different requirements of the specific services in each stage and gives the needle. The resource allocation strategy for sex. In the cloud network selection stage, cloud broker will reduce its cost by leasing the infrastructure of the Gong Youyun service provider to reduce its cost by using the price advantage reserved for long-term resources and the gain brought by the reuse of resource statistics. The instance cycle (virtual machines reserved instance terms), and the price of different instances of different cycles, the cloud broker must make the appropriate selection from a variety of instance cycles so that the minimum cost resources can be used to satisfy the user's real-time dynamic demand. An approximate algorithm based on set coverage is used to solve the problem of resource allocation under off-line conditions, and an online resource allocation algorithm based on the use of historical resources is proposed. The experiment shows that the running speed of the heuristic offline algorithm is almost two times as much as that of the off-line algorithm. Compared with the method of using real time virtual machines in the whole part, two off-line algorithms are used. The algorithm can save about 27% of the cost, and the online algorithm can save about 14% of the cost. Compared with the existing algorithm only considering a reserved instance, the proposed algorithm is more practical. In the data center selection stage, this paper studies the data center selection based on clustering and the migration of large data to the cloud at the data center selection stage. The center selects two problems. The first problem is to optimize the communication delay and bandwidth between data centers. A group of communication virtual machines for a large task or organization may cross multiple data centers, so it is necessary to minimize the distance between the largest data centers, so as to minimize communication delay and save a lot of time. In this paper, the 2- approximate data center selection algorithm based on density clustering is proposed in this paper. Compared with the existing methods that only consider the distance between data centers, the algorithm can make full use of the network topology attributes, dense attributes and capacity information of the data center, and the efficiency is raised about 15% to 72%. and then established. The virtual machine semi communication model is proposed, and a virtual machine partition algorithm based on this model is proposed to group the virtual machine and correspond to the selected data center. The experiment shows that the algorithm can not only reduce the bandwidth consumption between data centers, but also speed up to about 2 times faster than the existing algorithms. All algorithms are applicable to the isomorphism of virtual machines. The objective of the second problem is to select the appropriate data center for distributed large data and transfer the data to the cloud, which can not only guarantee fast local data access but also realize low cost data migration and processing. This paper first analyzes large data migration. The four objectives of time shift: fair data placement, optimal data placement, data placement and total cost minimization of data placement. Then a two division graph is used to model the problem. A compact 3- approximation algorithm based on combination optimization is proposed for the first two targets. A new kind of nearest two targets is proposed for the last two targets. The two algorithms can reduce the time delay and cost of access and solve the problems that are not available due to regulatory restrictions or user preferences, which make up for the shortcomings of the default all data centers available in the existing methods. A low cost server selection. Servers and networking account for about 60% of the total cost of the data center. How to effectively deploy virtual machines to save costs as much as possible and guarantee the quality of service plays an important role in improving the competitiveness of cloud service providers. The problem is formalized into a multi-objective nonlinear programming. By using the network topology information of the data center, the virtual machines with greater traffic are deployed as much as possible, reducing the communication delay and saving the bandwidth consumption between the servers. The default of resource capacity is limited to the specified minimum probability as much as possible. Such a deployment scheme does not affect the quality of service, but also saves the cost of the server and the network. This paper gives a heuristic algorithm for offline and online scenarios. Compared with the existing algorithms, experiments show that the proposed algorithm can save more. It includes a variety of resource costs, including the server and bandwidth, and runs faster. Finally, this paper considers the collaborative selection of data centers and servers during large data processing across data centers. Considering that all the data transmigrated across the global distribution to the same data center is not necessarily feasible and is widely accepted The analysis framework Hadoop can only deal with the data inside the data center, so it is necessary to study the large data processing mechanism across the data center. In this paper, a new cross data center large data processing architecture and a solution based on the key value (key value) are proposed, which can comply with the data locality principle of the traditional Hadoop, and The scheme can be used to deal with large data processing across data centers at a lower cost. The scheme uses two layers of programming to model the problem and uses a customized two layer packet genetic algorithm. The experiment proves the effectiveness of the scheme. Compared with the traditional Hadoop thought scheme and the existing latest scheme, the proposed scheme can save about 49% of the cost respectively. 40%.
【学位授予单位】：哈尔滨工业大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP301.6

【相似文献】