面向云计算的多维数据索引研究
[Abstract]:The widespread application of cloud computing technology makes the data explosively increasing, and brings new challenges to the traditional data management technology. The existing cloud storage systems generally use the form of distributed hash table to access data. This key-value-based model can obtain higher access efficiency in single-dimensional query, but it is more efficient than multi-dimensional query. When users submit multi-dimensional queries based on multiple attribute columns, due to the lack of effective secondary index system support, it is necessary to run the MapReduce task to scan the entire data set, thus reducing the query efficiency. Tables are presented at international top-level conferences and journals in the database field. This paper studies the multi-dimensional data indexing technology in the cloud computing environment. The main contents of this paper are summarized as follows: 1. To solve the problem that the existing cloud storage systems mainly support single-key index and lack effective multi-dimensional index, which leads to low efficiency of multi-dimensional query, this paper proposes a new multi-dimensional cloud data index scheme based on UB tree: CloudUB. Then, the dimension of the query is reduced, and the multi-dimensional space is divided into Z-region along Z-curve, and the Z-region information is organized by B+tree to establish an improved UB tree index. CloudUB can filter out the data space which can not contain the query results based on Z-region, so as to improve the query efficiency. In addition, the index construction and dimension based on HBase are designed. The mechanism saves B+leaf nodes based on Z-curve dimensionality reduction in HBase and transforms the original multi-dimensional search problem into a key-value query problem that can be supported by existing cloud storage systems, thus supporting high concurrent access to index tables by MapReduce technology. Based on Hadoop version 2.2, the test results of 10 million level data show that CloudUB index scheme supports flexible and efficient real-time index construction, and the efficiency of multi-dimensional query is significantly improved. 2. Through the in-depth study of data management in cloud computing system, this paper proposes a new method. KD-R, a two-tier multi-dimensional data index that conforms to the master-slave management of cloud computing system, establishes an R-tree index for local data on each data server in the cloud computing system. All local R-tree indexes together form the underlying index of the double-tier index system, and then part of the nodes of each R-tree index are sent to each other. To solve the problem of which local index nodes are published to the global index, this paper designs an adaptive node publishing algorithm and a cost model for selecting publishing nodes, which can estimate the index cost of local index nodes. The cost model periodically detects the index nodes on the local data server, and then adjusts the published local index nodes by using the adaptive node publishing algorithm to dynamically optimize the KD-R index. The experimental results show that the multi-dimensional query algorithm based on KD-R index has high memory utilization and query efficiency. 3. In view of the elasticity of users'needs and the fact that query dimensions are dynamically extended in cloud computing systems, this paper proposes a multi-dimensional cloud data index: CB-index based on Chord overlay network and zonal bitmap. At the same time, this paper designs a partitioned bitmap encoding mechanism, builds a local data index on the local data server through the partitioned bitmap, and realizes the combination of local index nodes and the Chord overlay network. In addition, an adaptive index node adjustment algorithm, a multi-dimensional query algorithm and an index maintenance algorithm are also designed. The experimental results show that CB-index index has high efficiency in multi-dimensional query and can avoid the complete reconstruction of index structure. It supports flexible index dimension expansion and is able to meet users' dynamic query requirements in cloud computing environment.
【学位授予单位】:电子科技大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP311.13
【相似文献】
相关期刊论文 前10条
1 ;本期广告商索引表[J];电子与电脑;2000年01期
2 ;本期编辑内容产品索引表[J];电子与电脑;2000年02期
3 ;本期广告商索引表[J];电子与电脑;2000年02期
4 ;本期编辑内容产品索引表[J];电子与电脑;2000年04期
5 ;本期广告商索引表[J];电子与电脑;2000年04期
6 ;本期编辑内容产品索引表[J];电子与电脑;2000年11期
7 ;本期广告商索引表[J];电子与电脑;2000年11期
8 ;本期编辑内容产品索引表[J];电子与电脑;1999年05期
9 ;本期编辑内容产品索引表[J];电子与电脑;1999年08期
10 ;本期编辑内容产品索引表[J];电子与电脑;1999年09期
相关会议论文 前9条
1 石玮峰;杨冬青;唐世渭;关涛;;COBASE的索引管理技术[A];第十二届全国数据库学术会议论文集[C];1994年
2 王彦祥;王广林;;“索引之星”的研制和索引编制[A];2004年辞书与数字化研讨会论文集[C];2004年
3 王晓辉;王柏;;通过有效使用索引优化Oracle应用系统性能[A];第九届全国青年通信学术会议论文集[C];2004年
4 孙云峰;陈渝;史元春;张宝鹏;张曦;江文峰;;基于高精度室内定位系统的移动物体轨迹索引[A];第二届和谐人机环境联合学术会议(HHME2006)——第2届中国普适计算学术会议(PCC'06)论文集[C];2006年
5 王先胜;乔健;汪卫;何震瀛;;AX-Tree:基于RDBMS的粒度自适应XML数据索引[A];第二十五届中国数据库学术会议论文集(一)[C];2008年
6 邵雄凯;卢炎生;程学先;;用建立本地广播索引表的方法改善移动客户机的性能[A];第二十届全国数据库学术会议论文集(技术报告篇)[C];2003年
7 薛巍;李维佳;穆飞;舒继武;;PDPI:一种面向多核的可扩展并行索引算法[A];全国网络与信息安全技术研讨会论文集(下册)[C];2007年
8 王鹏飞;洪晓光;;基于XML大文档的动态索引[A];第二十一届中国数据库学术会议论文集(技术报告篇)[C];2004年
9 杨彬;洪晓光;;基于XML大文档的动态索引[A];’2004计算机应用技术交流会议论文集[C];2004年
相关重要报纸文章 前1条
1 裘宗燕;轻松做索引[N];中华读书报;2002年
相关博士学位论文 前5条
1 何婧;面向云计算的多维数据索引研究[D];电子科技大学;2016年
2 马武彬;面向信息物理融合系统的资源索引构建和查询优化技术研究[D];国防科学技术大学;2014年
3 张帆;搜索引擎中索引表求交和提前停止技术优化研究[D];南开大学;2012年
4 陈旭毅;基于索引云的企业搜索引擎实现研究[D];武汉大学;2011年
5 余利华;分布式数据存储和处理的若干技术研究[D];浙江大学;2008年
相关硕士学位论文 前10条
1 周文辉;基于HBase和内存数据库的索引和查询技术研究与系统实现[D];南京大学;2014年
2 付佳;基于LSM树的NoSQL数据库索引研究[D];北京理工大学;2016年
3 王万乐;基于聚类的海量文档集分布式索引构建方法[D];山东大学;2016年
4 王健;DWMS中索引选择策略的研究与实现[D];东华大学;2010年
5 胡玉乐;列存储DWMS中的索引关键技术研究[D];东华大学;2011年
6 张慧;一种基于位立方体的XML索引方式[D];山东大学;2007年
7 王学;面向SaaS应用交付平台的多租户数据索引研究[D];山东大学;2012年
8 石有滴;XML索引关键技术研究[D];华南理工大学;2011年
9 陈坚强;DB2数据库索引性能调整与优化[D];上海交通大学;2011年
10 葛付江;面向动态文档集的大规模文本索引构建技术的研究[D];哈尔滨工业大学;2008年
,本文编号:2193230
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2193230.html