基于列存储的数据库物理层优化研究

发布时间：2018-01-22 07:05

本文关键词： 列存储索引技术树索引元辅音树　出处：《华中科技大学》2013年硕士论文　论文类型：学位论文

【摘要】：由于网络数据的海量增长、数据仓库和OLAP的飞速发展以及商务数据分析的需求，在海量数据存储和分析方面占有优势的列存储得到很快的成长。但以列为导向的物理层存储结构意味着在设计列存储模块或列数据库的物理层时，需要采用不同于传统行存储的方式。同时，传统的许多优化技术和方法在列存储中的效率普遍不高，且存储代价较大。其中比较典型的例子是索引技术。因此，研究列存储的物理层架构和索引技术，对列数据库的开发和应用具有重要的意义。基于以上需求，研究了列存储的物理层架构，对物理层各模块进行设计，实现了一个列存储的原型系统。在数据组织上采用固定记录数据块的方式和基于大内存分配的内存池管理方式。在压缩算法上，采用基于字典编码的LZW压缩算法，并与基于统计编码的PPM压缩算法进行性能对比。针对英文单词特征的长字符串类型，设计了一种旨在减少不相关检索数据块的元辅音树。首先，针对列存储索引的需求和字符串特性，，设计了一种精简的树结构；基于该树的结构，研究了字符串输入过程的状态变化，并基于此定义了有限自动状态机的各元组。之后，针对该树结构和有限自动状态机的各元组定义，设计了树的初始化、存储、字符串扫描等操作算法；在对有限自动状态机进行状态转移和状态推导的基础上，设计了查询匹配算法。在实际应用于列存储时，对元辅音树进一步改进，设计出元辅音根树和数据块元辅音树的双层结构，同时采用单模式和双模式匹配相结合的策略，在一次单模式匹配基础上进行二次双模式匹配，以此更进一步提高查询效率。
[Abstract]:Because of the huge growth of network data, the rapid development of data warehouse and OLAP and the demand of business data analysis. Column storage, which has an advantage in mass data storage and analysis, has grown rapidly, but a column-oriented physical layer storage structure means when designing a column storage module or column database physical layer. At the same time, many of the traditional optimization techniques and methods in column storage are generally inefficient, and the storage cost is high. The typical example is the index technology. It is of great significance for the development and application of column database to study the physical layer architecture and index technology of column storage. Based on the above requirements, the physical layer architecture of column storage is studied, and each module of the physical layer is designed. A prototype system of column storage is implemented. In data organization, the data block is fixed and the memory pool is managed based on large memory allocation. The LZW compression algorithm based on dictionary coding is adopted, and the performance of PPM compression algorithm based on statistical coding is compared. For the long string type of English word feature, a meta consonant tree is designed to reduce the uncorrelated retrieval data block. Firstly, aiming at the requirement of column storage index and the character of string. A simple tree structure is designed. Based on the structure of the tree, the state change of the string input process is studied, and the tuples of the finite automatic state machine are defined based on this. Then, the tuples of the tree structure and the finite automatic state machine are defined. The algorithms of tree initialization, storage, string scanning and so on are designed. A query matching algorithm is designed based on the state transfer and state derivation of the finite automatic state machine. In the practical application in column storage, the meta-consonant tree is further improved, and the two-layer structure of meta-consonant root tree and data block element consonant tree is designed. At the same time, the strategy of combining single pattern and double pattern matching is adopted. Second double pattern matching is carried out on the basis of single pattern matching so as to further improve the query efficiency.
【学位授予单位】：华中科技大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333

【参考文献】