分布式海量数据储存系统负载均衡算法的优化设计与实现

发布时间：2018-10-31 09:27

【摘要】：信息社会给人们带来便利的同时也产生了大量的数据，关系型数据库由于其架构缺陷不适用于处理大数据。目前比较常用的存储和处理海量数据的系统是是NoSQL(Not only Structured Query Language)数据库。显然，如何协同这些独立的计算机高效地完成任务是十分重要的，这就是负载均衡的作用。负载均衡是既定硬件架构下整个系统性能的关键，近些年来随着云计算的兴起迅速成为研究热点。论文在研究了常用的NoSQL系统和负载均衡技术的基础上，设计了一种新的负载均衡算法。这种算法综合利用了大数据处理系统本身的特点，具有较好的性能。同时该算法还能在最优节点已经被占用的情况下迅速找出次最优节点等，非常适合用于高并发系统的负载均衡。由于内存比磁盘的寻道速度大约快十万倍，为了提高响应速度，人们总是希望将尽可能多的数据由磁盘提前加载进入内存。但内存容量比磁盘小很多，因此内存容量往往成为大数据处理系统的瓶颈。为了提高内存的使用效率，本文提出了一种类似内存池的方法来重写管理内存分配的配置器，使内存的有效使用率得到了很大的改善。本文的主要工作包括：首先，研究了常用的NoSQL数据库和负载均衡算法，设计和实现了一个效率较高的负载均衡算法。通过引入节点与兄弟节点的子节点的关系，并辅之以旋转操作，有效降低了整棵排序二叉树的高度，使算法的时间代价从常用的加权轮询算法的O (N)降为O (logN)，而空间代价保持在O (N)不变。论文中除了提供算法实现的伪代码以外，，还对算法进行了分析并提供了流程图，可以根据流程图方便地移植到其他相似的系统中。配置器负责内存的分配和回收，默认的配置器具有效率不高、易产生内存碎片等弱点。本文通过采用内存池的思想优化了配置器，通过一次性从内核申请较大的内存并自己维护，提高了内存分配和使用的效率。此外，还根据不同的场景实现了多种配置器，并实现了配置器的参数化设置，使配置器可以根据使用场景通过脚本实现不同的配置，从而可以动态地对系统进行调节。最后，搭建了测试环境，并对该算法和内存优化进行了测试并对全文进行了总结和展望。
[Abstract]:The information society brings convenience to people but also produces a lot of data. Relational database is not suitable for dealing with big data because of its structure defect. At present, the commonly used system for storing and processing massive data is NoSQL (Not only Structured Query Language) database. Obviously, it is very important to cooperate with these independent computers to accomplish the task efficiently, which is the function of load balancing. Load balancing is the key to the performance of the whole system under the established hardware architecture. In recent years, with the rise of cloud computing, it has become a research hotspot. Based on the research of NoSQL system and load balancing technology, a new load balancing algorithm is designed in this paper. This algorithm makes full use of the characteristics of big data processing system and has good performance. At the same time, the algorithm can quickly find the sub-optimal nodes when the optimal nodes have been occupied, so it is very suitable for load balancing in high concurrent systems. Because the memory is about 100,000 times faster than the disk, in order to improve the response speed, people always want to load as much data as possible from the disk into memory. But memory capacity is much smaller than disk, so memory capacity often becomes the bottleneck of big data processing system. In order to improve the efficiency of memory use, this paper proposes a method similar to memory pool to rewrite the configuration that manages memory allocation, which greatly improves the effective utilization of memory. The main work of this paper is as follows: firstly, the common NoSQL database and load balancing algorithm are studied, and an efficient load balancing algorithm is designed and implemented. By introducing the relationship between the nodes and the children of the brother nodes, and adding the rotation operation, the height of the whole sorting binary tree is reduced effectively, and the time cost of the algorithm is reduced from the O (N) of the commonly used weighted polling algorithm to the O (logN),. The cost of space remains unchanged at O (N). In addition to the pseudo code of the algorithm, the algorithm is analyzed and the flow chart is provided, which can be easily transplanted to other similar systems according to the flow chart. The configurator is responsible for memory allocation and recovery. The default configuration is inefficient and prone to memory fragmentation and other weaknesses. This paper optimizes the configuration by using the idea of memory pool, and improves the efficiency of memory allocation and usage by applying large memory from the kernel at one time and maintaining itself. In addition, a variety of configurators are implemented according to different scenarios, and the parameterized settings of the configurators are implemented, so that the configurators can realize different configurations through scripts according to the usage scenarios, so that the system can be dynamically adjusted. Finally, the test environment is built, and the algorithm and memory optimization are tested, and the full text is summarized and prospected.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333;TP311.13

【参考文献】