内存索引的压缩存储及优化研究
发布时间:2018-07-14 15:22
【摘要】:随着计算机和数据库技术的迅猛发展,人类已进入信息时代,需要存储的数据大大增长,已远远超出单台服务器的承受范围。为了满足数据的检索需要,,大型索引系统往往建立在分布式系统之上,但在某些需要高响应低延迟与处理灵活性的场景下,分布式系统具有其本质性的困难。因此提升单机的存储及处理能力,特别是对于高配置服务器来说有不可替代的意义。 针对现代服务器的硬件架构与内存资源的稀缺性,本文提出了一种内存索引数据结构LC-Tree。针对CPU高速缓存、分支预测、多核下内存伪共享等硬件特征调整优化LC-Tree数据结构实现与内存布局。通过构造一个逻辑上的256叉树作为上层结构,分支节点结构利用位图索引、直接索引等方式迅速定位到底层节点。底层叶子节点在内存中连续排放有利于使用数据压缩算法节省有限的内存资源。 LC-Tree数据结构在实现上,结合计算机硬件特性与压缩算法,有效地在压缩率、解压时间和动态性能之间达到平衡,通过实现索引动态更新来支持数据实时更新。针对无法放入单机的大规模数据的存储和检索,根据业务场景与分布式系统的设计原则,提出索引存储的分布式解决方案,以满足大数据下的数据检索需求。
[Abstract]:With the rapid development of computer and database technology, mankind has entered the information age, and the data that needs to be stored has greatly increased, far beyond the bearing range of a single server. In order to meet the needs of data retrieval, large index systems are often built on distributed systems, but in some scenarios that require high response and low latency and processing flexibility, distributed systems are inherently difficult. Therefore, improving the storage and processing capability of single machine, especially for high configuration server, has irreplaceable significance. Aiming at the scarcity of memory resources and hardware architecture of modern server, this paper proposes a memory indexed data structure LC-Tree. The implementation of LC-Tree data structure and memory layout are optimized for CPU cache, branch prediction and memory pseudo-sharing under multi-core. By constructing a logical 256 fork tree as the upper structure, the branch node structure uses bitmap index and direct index to locate the underlying node quickly. Continuous discharge of leaf nodes in memory can save limited memory resources by using data compression algorithm. LC-Tree data structure is implemented in combination with computer hardware characteristics and compression algorithm. There is a balance between decompression time and dynamic performance, and real-time updating of data is supported by dynamic update of index. According to the design principle of business scene and distributed system, the distributed solution of index storage is proposed to meet the data retrieval requirements in big data.
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP311.13;TP333
本文编号:2122076
[Abstract]:With the rapid development of computer and database technology, mankind has entered the information age, and the data that needs to be stored has greatly increased, far beyond the bearing range of a single server. In order to meet the needs of data retrieval, large index systems are often built on distributed systems, but in some scenarios that require high response and low latency and processing flexibility, distributed systems are inherently difficult. Therefore, improving the storage and processing capability of single machine, especially for high configuration server, has irreplaceable significance. Aiming at the scarcity of memory resources and hardware architecture of modern server, this paper proposes a memory indexed data structure LC-Tree. The implementation of LC-Tree data structure and memory layout are optimized for CPU cache, branch prediction and memory pseudo-sharing under multi-core. By constructing a logical 256 fork tree as the upper structure, the branch node structure uses bitmap index and direct index to locate the underlying node quickly. Continuous discharge of leaf nodes in memory can save limited memory resources by using data compression algorithm. LC-Tree data structure is implemented in combination with computer hardware characteristics and compression algorithm. There is a balance between decompression time and dynamic performance, and real-time updating of data is supported by dynamic update of index. According to the design principle of business scene and distributed system, the distributed solution of index storage is proposed to meet the data retrieval requirements in big data.
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP311.13;TP333
【参考文献】
相关期刊论文 前10条
1 赵园春;李成名;赵春宇;;基于R树的分布式并行空间索引机制研究[J];地理与地理信息科学;2007年06期
2 阚君满;;基于改进哈夫曼编码的全文索引结构压缩算法[J];吉林大学学报(信息科学版);2011年05期
3 赵鹏;一种基于压缩的全文本数据库倒排索引方法[J];黑龙江大学自然科学学报;2005年03期
4 骆吉洲;李建中;;一种索引结构的压缩存储及其查询处理技术[J];计算机工程与应用;2007年08期
5 何小苑;闵华清;;基于聚类的Hilbert R-树空间索引算法[J];计算机工程;2009年09期
6 张明波,陆锋,申排伟,程昌秀;R树家族的演变和发展[J];计算机学报;2005年03期
7 王梅;杨思箫;乐嘉锦;;列存储数据库中压缩位图索引技术[J];计算机工程;2012年18期
8 管建和;甘剑峰;;基于Lucene全文检索引擎的应用研究与实现[J];计算机工程与设计;2007年02期
9 陈占龙;吴信才;谢忠;吴亮;;分布式空间数据索引机制研究[J];微电子学与计算机;2007年10期
10 郑丽英;数据结构Trie及其应用[J];现代计算机(专业版);2004年08期
相关博士学位论文 前1条
1 潘鹏;时空数据库的索引机制及查询策略研究[D];华中科技大学;2007年
本文编号:2122076
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2122076.html