基于多维数据模型的航空公司机票结算数据分析

发布时间:2018-02-04 13:52

  本文关键词: 机票结算数据 数据仓库 冰山立方体 位图索引 分布式计算 数据挖掘 出处:《中国民航大学》2017年硕士论文 论文类型:学位论文


【摘要】:随着我国民航业的蓬勃发展,越来越多的旅客选择飞机作为出行工具。航空公司的客运量正在快速增长,与此同时产生的机票结算数据也在爆炸式的增长。长期的数据积累使得机票结算数据不仅数据维度较多,而且数据量较大。同时,机票结算数据的分析工作使得传统的BI(Business Intelligence)系统面临极大的性能挑战,因此构建多维机票结算数据立方体,采用分布式计算技术加快数据的查询和分析速度具有重大意义。针对BI系统中多维数据的冰山立方体聚合计算的时间性能问题,本文提出一种基于位图索引改进的DPBUC_BI(Dynamic Pruning based BUC_BI)算法。该算法利用位图索引按列组织的特性重新定义BUC(Bottom-Up Computation)算法的分组操作,加快了数据的加载和查询;通过使用逻辑位运算实现聚合计算,提高了算法的计算性能。针对机票结算数据在部分维度上存在聚集现象增加动态剪枝策略,在保证算法正确性的情况下,进一步提高了冰山立方体计算性能。最后将DPBUC_BI算法应用于机票结算数据的冰山立方体计算中,实验结果表明:该算法可以很好地提升计算性能,相对于经典BUC算法在时间性能上有较大提高。为了更好地对海量机票结算数据进行存储和多维度分析,本文使用分布式计算框架来实现传统的机票结算分析平台。在使用Flume和Sqoop完成数据的迁移工作的基础上构建事实星座模型的数据仓库,同时对比了ROC和Parquet两种存储格式各自的特点。针对位图索引空间占用较大的问题,提出使用EWAH(Enhanced Word Aligned Hybrid)算法对位图索引进行压缩,实现了基于MapReduce模型的多维聚合算法和多维关联规则挖掘分析算法。实验结果表明:分布式机票结算数据分析平台不仅可以快速完成简单统计分析,而且能够很好地运行并行的关联规则挖掘分析算法。
[Abstract]:With the rapid development of China's civil aviation industry, more and more passengers choose aircraft as a travel tool. The passenger volume of airlines is growing rapidly. At the same time, the air ticket settlement data is also explosive growth. The long-term accumulation of data makes the ticket settlement data not only more dimensions, but also a large amount of data. At the same time. The analysis of ticket settlement data makes the traditional BI(Business Intelligence system face great performance challenges, so the multidimensional ticket settlement data cube is constructed. It is of great significance to use distributed computing technology to speed up the query and analysis of data. The time performance of iceberg cube aggregation computation of multidimensional data in BI system is discussed. In this paper, an improved DPBUC_BI(Dynamic Pruning based BUCCI based on bitmap index is proposed. Algorithm. The algorithm redefines the grouping operation of the BUC(Bottom-Up Computation algorithm using the property that bitmap indexes are organized by columns. Speed up the data loading and query; The performance of the algorithm is improved by using logical bit operation to achieve aggregate computation. Dynamic pruning strategy is added to some dimensions of ticket settlement data to ensure the correctness of the algorithm. Finally, the DPBUC_BI algorithm is applied to the iceberg cube calculation of air ticket settlement data. The experimental results show that the algorithm can improve the performance of the algorithm. Compared with the classical BUC algorithm, the time performance is greatly improved. In order to better store the massive air ticket settlement data and multi-dimensional analysis. In this paper, the distributed computing framework is used to realize the traditional air ticket settlement and analysis platform, and the data warehouse of the factual constellation model is constructed on the basis of data migration by using Flume and Sqoop. At the same time, the characteristics of two storage formats, ROC and Parquet, are compared. The bitmap index is compressed using EWAH(Enhanced Word Aligned hybrid algorithm. The multi-dimensional aggregation algorithm based on MapReduce model and the multi-dimension association rule mining analysis algorithm are implemented. The experimental results show that:. Distributed ticket settlement data analysis platform can not only quickly complete simple statistical analysis. And the parallel association rule mining analysis algorithm can be run well.
【学位授予单位】:中国民航大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F560.5;TP311.13

【参考文献】

相关期刊论文 前10条

1 高金敏;乐美龙;曲林迟;;基于离散时间的定价与舱位控制联合决策[J];交通运输工程学报;2016年06期

2 丁祥武;郭涛;王梅;金冉;;一种大规模分类数据聚类算法及其并行实现[J];计算机研究与发展;2016年05期

3 刘越;李锦涛;虎嵩林;;基于代价估计的Hive多维索引分割策略选择算法[J];计算机研究与发展;2016年04期

4 萨初日拉;周国亮;时磊;王刘旺;石鑫;朱永利;;Spark环境下并行立方体计算方法[J];计算机应用;2016年02期

5 徐海荣;陈闵叶;张兴媛;;基于Flume、Kafka、Storm、HDFS的航空维修大数据系统[J];上海工程技术大学学报;2015年04期

6 包丹文;华松逸;;基于通达成本的机场可达性水平与客运规模关联性研究[J];武汉理工大学学报(交通科学与工程版);2015年06期

7 陈永艳;束洪春;董俊;曹璞t,

本文编号:1490368


资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1490368.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户c6f45***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com