基于分布式内存的OLAP查询技术研究
发布时间:2018-08-26 07:08
【摘要】:目前云计算环境下海量数据的查询分析以批处理、线下计算为主,很难达到联机性、即席性和交互性。数据立方体是数据仓库、商业智能中重要的数据模型,其数据单元之间根据上卷、下钻关系构成了格结构数据。为了提高查询分析性能,本文基于数据立方体格模型及分布式内存架构研究在线、高效的联机分析处理技术。 本文主要在以下两个方面进行研究: (1)视格结构数据为图结构数据,以格结构数据的统计特性和规律为切入点,利用实验统计方法、复杂网络等经典解析模型、格的概念层次结构,研究格结构数据的模型;在此基础上,结合当前的图划分技术研究格结构数据在多个节点的划分及存储,从而消耗较小的通讯代价并使得集群负载均衡。 (2)分层封闭立方体是封闭立方体模型的扩展,是对数据立方体的一种语义压缩,能够有效地减少数据立方体的存储空间。对封闭数据立方体使用数组和格两种结构存储,利用封闭单元的层次信息以及封闭单元之间元组覆盖关系,研究两种结构下分层封闭立方体的分布式存储、查询方法。 最终构建基于内存的分布式计算框架Spark实现了OLAP查询原型系统,并进行了实验验证和分析。
[Abstract]:At present, the query analysis of massive data in cloud computing environment is mainly based on batch processing, offline computing, it is difficult to achieve online, ad hoc and interactive. Data cube is an important data model in business intelligence. In order to improve the performance of query analysis, this paper studies online and efficient OLAP technology based on data cube model and distributed memory architecture. This paper mainly studies in the following two aspects: (1) taking the lattice structure data as the graph structure data, taking the statistical characteristics and rules of the lattice structure data as the breakthrough point, using the experimental statistical method, the complex network and other classical analytical models. The conceptual hierarchy of lattice, the model of lattice structure data, and the partition and storage of lattice structure data in multiple nodes are studied based on the current graph partitioning technique. Therefore, it consumes less communication cost and makes cluster load balance. (2) hierarchical closed cube is an extension of closed cube model, and it is a semantic compression of data cube. It can effectively reduce the storage space of data cube. Two kinds of structures are used to store closed data cubes: array and lattice. Using the hierarchical information of closed units and the tuple overlay relationship between closed cells, the distributed storage and query method of hierarchical closed cubes under two structures are studied. Finally, a distributed computing framework based on memory, Spark, is built to implement the OLAP query prototype system, and the experimental verification and analysis are carried out.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP333
本文编号:2204107
[Abstract]:At present, the query analysis of massive data in cloud computing environment is mainly based on batch processing, offline computing, it is difficult to achieve online, ad hoc and interactive. Data cube is an important data model in business intelligence. In order to improve the performance of query analysis, this paper studies online and efficient OLAP technology based on data cube model and distributed memory architecture. This paper mainly studies in the following two aspects: (1) taking the lattice structure data as the graph structure data, taking the statistical characteristics and rules of the lattice structure data as the breakthrough point, using the experimental statistical method, the complex network and other classical analytical models. The conceptual hierarchy of lattice, the model of lattice structure data, and the partition and storage of lattice structure data in multiple nodes are studied based on the current graph partitioning technique. Therefore, it consumes less communication cost and makes cluster load balance. (2) hierarchical closed cube is an extension of closed cube model, and it is a semantic compression of data cube. It can effectively reduce the storage space of data cube. Two kinds of structures are used to store closed data cubes: array and lattice. Using the hierarchical information of closed units and the tuple overlay relationship between closed cells, the distributed storage and query method of hierarchical closed cubes under two structures are studied. Finally, a distributed computing framework based on memory, Spark, is built to implement the OLAP query prototype system, and the experimental verification and analysis are carried out.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP333
【参考文献】
相关期刊论文 前4条
1 符青云;刘心松;;基于协同缓存的分布式数据库更新机制研究[J];计算机工程;2007年07期
2 刘建;孙晓辉;倪宏;;一种基于DHT的Web缓存共享方法[J];计算机应用研究;2008年12期
3 李盛恩,王珊;封闭数据立方体技术研究[J];软件学报;2004年08期
4 秦秀磊;张文博;魏峻;王伟;钟华;黄涛;;云计算环境下分布式缓存技术的现状与挑战[J];软件学报;2013年01期
,本文编号:2204107
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2204107.html