面向大规模图数据处理的虚拟机管理系统研究与实现
本文选题:大规模图数据处理 + Pregel系统 ; 参考:《东南大学》2016年硕士论文
【摘要】:随着电子商务、移动互联网、物联网等技术的进一步发展,数据的规模、产生速度、复杂性均日益增长,标志着人类社会已经进入了大数据时代。随着数据之间的联系变得更加紧密、依赖关系更加复杂,部分数据的分布模式逐渐具有图的特征。传统的大数据处理技术,如MapReduce批处理框架不适用于关系复杂、需要多次迭代的图数据。谷歌的Pregel系统通过并行化的思想,并行地进行顶点计算,大大提高了计算性能,为大规模图数据处理提供了新的思路。现有大规模图数据处理的研究工作均基于Pregel的思想,部分解决了大规模图数据处理的问题,但仍存在如下的问题:一方面忽略了未隔离应用间的资源竞争而造成的性能衰减;另一方面则忽略了应用各阶段对资源的弹性需求而造成的性能下降或者资源浪费问题。为了解决以上的问题,本硕士论文将虚拟化技术引入图数据处理中,通过深入分析图数据处理过程的执行特性,并结合虚拟化良好的进程隔离性以及灵活的资源弹性可管理性,提出面向应用的图划分以及资源分配与调度机制,实现根据应用的具体执行模式来进行资源的弹性供给,以提高图数据处理系统的整体执行效率。综上所述,本论文从以下四个方面开展研究工作:首先,研究大规模图数据处理应用执行模式的抽取和分析机制。基于开源类Pregel系统进行二次开发,完成对应用执行模式的抽取,并建立执行模式与底层资源需求间的映射关系,为后续虚拟资源的分配与调度提供可靠的理论依据,是本文研究工作的基础。其次,研究应用感知的大规模图数据划分方法。大规模图数据划分是并行计算的前提。本文在虚拟化环境中,根据应用的执行模式,对图数据进行合理地划分,从而减少网络通讯并且实现负载均衡。对图数据进行合理地划分一方面有助于更好地进行资源分配与调度,另一方面可以提升应用的执行性能。再次,研究面向应用执行模式的虚拟资源分配与调度机制。基于执行模式与底层资源需求的映射关系,设计面向应用的虚拟资源分配与调度机制,根据上层应用执行模式对资源进行细粒度的分配与调度,在保证上层应用性能的前提下提高资源利用率。最后,通过部署Openstack软件以实现虚拟化环境,在此基础上,实现了相关理论的研究工作,设计并开发了大规模图数据处理平台nutcat以集成应用特征抽取模块,应用感知的超块划分模块以及面向应用执行模式的虚拟资源分配与调度模块,并部署于东南大学云计算中心(SEU CLOUD)真实环境中。通过在真实的东南大学云计算中心环境中的实验结果表明,本文提出的应用感知的大规模图数据划分方法以及面向应用执行模式的虚拟资源分配与调度机制可以显著地提高应用的执行性能并提升虚拟资源利用率,并为大数据应用与虚拟化环境相结合提供了面向应用进行资源分配与调度的新思路。
[Abstract]:With the further development of e-commerce, mobile Internet, Internet of things and other technologies, the scale, speed and complexity of the data are increasing, which indicates that the human society has entered the era of big data. With the connection of data become more closely, the dependence relationship is more complex, and the distribution pattern of some data has the characteristics of the graph gradually. The traditional large data processing technology, such as the MapReduce batch processing framework, is not suitable for complex relations and needs multiple iterations of graph data. Google's Pregel system performs vertex computation in parallel through parallel thinking, which greatly improves the computing performance and provides a new way of thinking for large scale graph data processing. The research work is based on the idea of Pregel, which partly solves the problem of data processing in large scale maps, but there are still some problems as follows: on the one hand, it ignores the performance attenuation caused by the resource competition between non isolated applications, and on the other hand, it neglects the performance degradation or resource wave caused by the elastic demand of resources at various stages of application. In order to solve the above problems, this thesis introduces the virtualization technology into the graph data processing, analyzes the execution characteristics of the data processing process, and proposes the application oriented graph division and resource allocation and scheduling mechanism by combining the good process isolation of the virtualization process and flexible resource flexibility manageability. To implement the flexible supply of resources according to the specific implementation mode of the application to improve the overall execution efficiency of the map data processing system. In summary, this thesis will carry out the research work from the following four aspects: first, study the extraction and analysis mechanism of the application execution mode of large-scale map data processing. Based on the open source Pregel system The two development, completing the extraction of application execution pattern, and establishing the mapping relationship between the execution mode and the underlying resource requirements, providing a reliable theoretical basis for the distribution and scheduling of the subsequent virtual resources, is the foundation of the research work. Secondly, it studies the method of data partition of the large-scale pattern map of the application perception. In this paper, in the virtualization environment, according to the implementation mode of the application, the graph data is divided reasonably, thus reducing the network communication and realizing the load balancing. The rational partition of the graph data is helpful to the better allocation and scheduling of resources, on the other hand, the performance of the application can be improved. Again, Research on the virtual resource allocation and scheduling mechanism oriented to application implementation mode. Based on the mapping relationship between the execution mode and the underlying resource requirements, the application oriented allocation and scheduling mechanism of virtual resources is designed. According to the upper application execution mode, the resource is distributed and scheduling, and the performance of the upper application is guaranteed. In the end, by deploying Openstack software to realize the virtualization environment, on this basis, the research work of related theories is realized, and a large scale map data processing platform nutcat is designed and developed to integrate application feature extraction modules, use perceived block partition block and Virtual Resource Allocation Oriented to application execution mode. With the scheduling module, and deployed in the real environment of the SEU CLOUD center of Southeast University, the experimental results in the real environment of the Southeast University Cloud Computing Center show that the proposed application aware mass graph data partition method and the application execution mode oriented virtual resource allocation and scheduling mechanism can be significant It improves the performance of the application and improves the utilization of virtual resources, and provides a new idea for the application of resource allocation and scheduling for the combination of large data application and virtual environment.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.52;TP302
【相似文献】
相关期刊论文 前1条
1 黄显凯;;EuP导入建议与生态化设计符合性执行模式[J];电子与电脑;2008年07期
相关重要报纸文章 前10条
1 本报记者 舒沁 本报通讯员 余宁;分段集约机制 优化执行模式[N];人民法院报;2011年
2 牡丹江市委市政府督查办暨“三实两创”办公室;探索城市的执行模式[N];学习时报;2013年
3 王国军 记者 姜文明;创新执行模式 破解执行难题[N];北方法制报;2009年
4 北京市西城区人民法院 张纬;“分段集约”诚可贵 传统优势不可抛[N];人民法院报;2012年
5 记者 娄银生;徐州“泉山执行模式”带动各地破解司法难题[N];人民法院报;2012年
6 记者 谢晓曦 通讯员 张瑞雪;重庆打破包案到底执行模式[N];人民法院报;2009年
7 倪志祥;传统执行模式存在弊端[N];江苏经济报;2013年
8 特约通讯员 晓梅 秋苏;出彩的“全员能动执行模式”[N];徐州日报;2011年
9 吴欢 朱益虎;常熟多维度创新执行模式提升执法效果[N];江苏经济报;2014年
10 江西省石城县人民法院 陈默;“泉山模式”以能动司法破解司法难题[N];人民法院报;2012年
相关硕士学位论文 前5条
1 徐海茹;盲量子计算协议与执行模式研究[D];广东工业大学;2016年
2 李小龙;广州市新能源公交车推广政策执行模式分析及效果评价研究[D];华南农业大学;2016年
3 张骏雪;面向大规模图数据处理的虚拟机管理系统研究与实现[D];东南大学;2016年
4 王子睿;非诉行政执行模式研究[D];西南政法大学;2015年
5 陈静;生产性服务业市场导向执行模式与绩效相关性研究[D];沈阳工业大学;2012年
,本文编号:1898539
本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/1898539.html