当前位置:主页 > 科技论文 > 软件论文 >

基于Hadoop的联机分析处理系统关键技术研究

发布时间:2018-05-18 23:15

  本文选题:联机分析处理 + HOLAP ; 参考:《电子科技大学》2016年硕士论文


【摘要】:近年来,多维数据查询联机分析处理技术(Online Analytical Processing,OLAP)越来越重要。基于OLAP的多维分析技术成为企业管理人员重要的决策依据。目前,针对OLAP的研究都是面向单一数据模型的存储处理和相应OLAP查询性能上的优化。单一数据组织模式的基于关系数据库的ROLAP(Relational-OLAP)和基于多维数据库的MOLAP(Multidimensional-OLAP),无法满足在不同规模级别数据集下异构数据模型和低延迟的多维查询需求。针对以上问题,本文从不同数据组织模型的查询规划、查询解释、缓存查询优化机制等方面改进,设计和实现了一个可扩展性和高效性的分布式混合型联机分析处理(Hybrid-OLAP,HOLAP)系统。该系统旨在解决不同规模级别数据集的多维查询,根据不同多维组织的实现模式作出高效合理的查询处理。基于该系统下的研究主要包括以下四个方面的内容:第一,针对传统ROLAP系统无法高效地解决大规模数据集的多维分析问题,提出了一个能够在Hadoop环境下,满足不同规模级别数据集进行快速多维查询分析,同时支持Hive的MDX(Multidimensional Expressions)查询解释和聚集方法,以及基于Hbase预计算缓存机制的多维查询优化方法的HOLAP系统架构。第二,针对大规模数据集上的Hive多维查询优化,通过一种构建Hbase立方体缓存的分段逐层降维聚集算法(S-Redu-D-A),研究了从类似关系型数据库Hive到Nosql数据库中,Hbase数据模型的形式化方法(Hsql-To-Nosql Formalized Model,Hs-Nos-FM)。提出并验证了满足HOLAP高效地形式化多维立方体(Format Multi Cube,F-M-Cube)数据存储模型,在大规模数据集多维查询上表现出良好的性能。第三,针对两种查询计划,通过实时性要求、数据规模、维度基数、存储空间、多表连接、查询频率等指标进行查询规划计算分析;提出了包含权限控制、查询监听、查询分析和查询分配的查询规划工作流程。通过对不同规模数据、不同多维查询的执行时间对比分析,有效地验证了基于HOLAP系统架构的查询规划方法,在常见OLAP多维查询中表现出良好的性能。最后,本文通过HOLAP系统架构下的查询规划方法、查询解释机制、形式化多维立方体构建方法、聚集缓存机制、支持Hive的MDX查询,并嵌入形式化方法的构建算法进行详细设计和实现。经过测试,本系统具有良好的性能,达到了预期的设计目标。
[Abstract]:In recent years, online Analytical processing technology (OLAP) is becoming more and more important. Multidimensional analysis technology based on OLAP has become an important decision basis for enterprise managers. At present, the research of OLAP is focused on the storage and processing of single data model and the optimization of OLAP query performance. The single data organization model based on relational database relation al-OLAP) and the multidimensional database based model Multidimensional-OLAPP can not meet the requirements of heterogeneous data model and low latency multidimensional query under different scale data sets. Aiming at the above problems, this paper improves the query planning, query interpretation and cache query optimization mechanism of different data organization models, and designs and implements a distributed hybrid on-line analytical processing system named hybrid-OLAPHLAPP. The purpose of the system is to solve the multi-dimensional query of data sets of different scales and to make efficient and reasonable query processing according to the implementation mode of different multidimensional organizations. The research based on this system mainly includes the following four aspects: first, aiming at the traditional ROLAP system can not solve the multidimensional analysis problem of large-scale data sets efficiently, a new method is proposed, which can be used in the Hadoop environment. At the same time, it supports the MDX(Multidimensional expressions of Hive query interpretation and aggregation method, and the HOLAP system architecture based on the Hbase prediction cache mechanism of multidimensional query optimization method. Second, for Hive multidimensional query optimization on large data sets, In this paper, we study the formal method of Hbase data model from similar relational database (Hive) to Nosql database (Nosql) through a piecewise hierarchical dimensionality reduction aggregation algorithm (S-Redu-D-An), which is used to construct Hbase cube cache. The formal method is Hsql-To-Nosql Formalized Model-Hs-Nos-FMN. This paper presents and verifies the efficient formative data storage model of multi-dimensional cube format Multi F-M-Cubesatisfying HOLAP, and shows good performance on multidimensional query of large data sets. Third, for two query plans, through real-time requirements, data size, dimensional cardinality, storage space, multi-table join, query frequency and other indicators for query planning and calculation analysis; proposed including authority control, query monitoring, Query analysis and query allocation of query planning workflow. By comparing and analyzing the execution time of different scale data and multidimensional query, the query planning method based on HOLAP system architecture is validated effectively, and it shows good performance in common OLAP multidimensional query. Finally, through the query planning method, query interpretation mechanism, formalization of multidimensional cube construction method, gathering cache mechanism, this paper supports MDX query of Hive. And embed formal method to build the algorithm for detailed design and implementation. After testing, the system has good performance and achieves the expected design goal.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.13

【参考文献】

相关期刊论文 前6条

1 熊宁;;大数据分析的分布式MOLAP技术[J];信息技术与信息化;2015年02期

2 王会举;覃雄派;王珊;张延松;李芙蓉;;面向大规模机群的可扩展OLAP查询技术[J];计算机学报;2015年01期

3 郭朝鹏;王智;韩峰;张一川;宋杰;;HaoLap:基于Hadoop的海量数据OLAP系统[J];计算机研究与发展;2013年S1期

4 张延松;焦敏;王占伟;王珊;周p,

本文编号:1907639


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1907639.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户58eea***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com