内存数据库存储结构及索引的研究与设计

发布时间：2018-09-17 13:59

【摘要】：20世纪80年代后，随着内存价格不断走低，存储芯片的集成度越来越高，数据库中的全部数据或者大部分数据存在主存中已完全成为可行，内存数据库的研究与应用开始逐步兴起。内存数据库在实时和高并发的应用中所展现出的数据访问能力是磁盘数据库不可比拟的，然而内存数据库的研究与应用毕竟起步比较晚，它的很多设计理念和传统的磁盘数据库不再完全一致。由于磁盘和内存这两种存储介质间的的巨大的差别，基于磁盘结构的数据库系统的数据结构、算法、查询策略、索引结构等，不一定适用于内存数据库系统。本文以内存数据库为研究方向，以内存数据库中的存储结构和索引为重点研究对象，关注于缓存和TLB。在存储结构上，深入的分析了广泛应用于当前内存数据库中N-Array存储结构，指出它的不足：较差的缓存利用率。随后文章给出了改进方案：利用离散存储结构的思想，在单个内存页面中对要存储的业务表的属性值进行分区储存。这样相同属性值在内存页面中空间局部性得到提高，在属性查询上该存储结构能提供更高的缓存利用率。在索引结构上，根据内存数据库索引结构的研究历程，，从当今广泛应用于内存数据库的T树到研究火热的缓存敏感树。在缓存敏感索引树的领域中重点分析了缓存敏感树CSS树、CSB+树、HT树，指出他们各自的优缺点及适用情形。CSS树更新代价太大、CSB+树由于一味注重缓存而忽略了TLB对性能的影响。在HT树的设计中通过把叶子节点设计成Hash桶来降低索引树的高度从而减少TLB失配。在HT树设计思想的指引下，本文对CSB+树提出改进，扩大树节点的扇出度，在兼顾缓存的同时控制索引树的高度。相比于CSB+树，改进的CSB+树中增加了树节点内分区和树节点内的索引。最后给出了改进CSB+树的查询和插入时最主要的算法。最后给出了改进的CSB+树和CSB+树的缓存失配和TLB失配实验对比。
[Abstract]:Since the 1980s, as the price of memory has been falling, the integration of memory chips has become more and more high, and it has become completely feasible to store all or most of the data in the database in the main memory. The research and application of memory database began to rise gradually. The data access ability of memory database in real-time and high concurrency application is incomparable to disk database. However, the research and application of memory database start relatively late. Many of its design ideas are no longer fully consistent with traditional disk databases. Because of the huge difference between disk and memory, the data structure, algorithm, query strategy and index structure of the database system based on disk structure may not be suitable for the memory database system. This paper takes the memory database as the research direction, and focuses on the storage structure and index in the memory database, focusing on the cache and TLB.. On the storage structure, this paper deeply analyzes the N-Array storage structure which is widely used in the current memory database, and points out its deficiency: poor cache utilization. Then an improved scheme is presented: using the idea of discrete storage structure, the attribute values of the business table to be stored are partitioned in a single memory page. In this way, the spatial locality of the same attribute value is improved in the memory page, and the storage structure can provide higher cache utilization on the attribute query. In the index structure, according to the research history of the index structure of the memory database, from the T tree which is widely used in the memory database nowadays to the cache sensitive tree which is the hot research. In the field of cache sensitive index tree, this paper mainly analyzes the cache sensitive tree CSS tree / CSS tree HT tree, and points out their respective advantages and disadvantages and their application. The cost of updating the CSS tree is too high. The TLB tree neglects the effect of TLB on the performance because of paying attention to cache blindly. In the design of HT tree, the TLB mismatch is reduced by reducing the height of the index tree by designing the leaf node as a Hash bucket. Under the guidance of the design idea of HT tree, this paper proposes an improvement on CSB tree to expand the fan out degree of tree node, and to control the height of index tree while taking into account the cache. Compared with the CSB tree, the improved CSB tree adds partitioning within the tree node and index within the tree node. Finally, the most important algorithms for query and insertion of improved CSB tree are given. Finally, the buffer mismatch between the improved CSB tree and the CSB tree is compared with the TLB mismatch experiment.
【学位授予单位】：湖南大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP311.13;TP333

【参考文献】