基于云平台的高速公路交通数据仓库设计与查询优化研究与实现
发布时间:2019-01-01 17:38
【摘要】:随着物联网技术的发展,智能化传感器的增多,交通行业收集到的数据急速增长。特别是在高速公路收费系统中,每天都会产生海量的高速公路收费站数据。通过分析这些结构化的数据,可以得到高速公路车流量、载运量时空分布、高速公路运输景气指数、收费报表同比环比等非常有价值的信息,为高速公路管理人员的正确决策提供数据支持。当前,大多数交通部门所使用的管理系统都是使用Oracle驱动的数据库。面对数据体量愈发庞大的高速公路收费站数据,这些管理系统已经出现数据整合过程复杂、时间久、依赖专业人员、数据查询速度慢等问题。因此,本文研究基于云平台的高速公路交通数据仓库设计与查询优化技术。首先,本文针对高速公路收费站数据特点,设计一种面向海量高速公路收费站数据的数据仓库,其构建过程包括数据抽取、数据预处理和数据加工等三个核心操作阶段。其次,本文通过比较Hive和Impala的查询特点,分析数据仓库的分区粒度和高速公路管理的业务特点,提出了三种数据仓库查询优化方法。然后,本文基于分布式文件存储系统HDFS、数据仓库工具Hive和数据查询引擎Impala实现数据仓库构建,设计并实现了面向高速公路管理的数据可视化平台,提供数据查询及专题分析等功能。最后,本文使用实际的高速公路收费站数据验证数据仓库的功能和性能,结果表明本文提出的数据查询优化方法能够有效提高数据查询效率,缩短查询时间。
[Abstract]:With the development of Internet of things technology and the increase of intelligent sensors, the data collected by transportation industry is increasing rapidly. Especially in the freeway toll collection system, a large amount of highway toll collection station data are generated every day. By analyzing these structured data, we can get very valuable information such as freeway traffic flow, space-time distribution of carrying capacity, expressway transportation boom index, toll report forms, and so on. Provide data support for highway managers to make correct decisions. Currently, most management systems used by transportation departments are Oracle-driven databases. Faced with the increasingly large data volume of highway toll station data, these management systems have problems such as complex data integration process, long time, dependence on professionals, slow data query speed and so on. Therefore, this paper studies the highway traffic data warehouse design and query optimization technology based on cloud platform. Firstly, according to the characteristics of highway toll station data, this paper designs a data warehouse for mass highway toll station data. The construction process includes three core operation stages: data extraction, data preprocessing and data processing. Secondly, by comparing the query characteristics of Hive and Impala, this paper analyzes the partition granularity of data warehouse and the business characteristics of highway management, and puts forward three query optimization methods of data warehouse. Then, based on the distributed file storage system HDFS, data warehouse tool Hive and the data query engine Impala, this paper designs and implements the data visualization platform for highway management. Provides data query and project analysis functions. Finally, the function and performance of the data warehouse are verified by the actual toll station data in this paper. The results show that the data query optimization method proposed in this paper can effectively improve the efficiency of data query and shorten the query time.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13;TP393.09
本文编号:2397894
[Abstract]:With the development of Internet of things technology and the increase of intelligent sensors, the data collected by transportation industry is increasing rapidly. Especially in the freeway toll collection system, a large amount of highway toll collection station data are generated every day. By analyzing these structured data, we can get very valuable information such as freeway traffic flow, space-time distribution of carrying capacity, expressway transportation boom index, toll report forms, and so on. Provide data support for highway managers to make correct decisions. Currently, most management systems used by transportation departments are Oracle-driven databases. Faced with the increasingly large data volume of highway toll station data, these management systems have problems such as complex data integration process, long time, dependence on professionals, slow data query speed and so on. Therefore, this paper studies the highway traffic data warehouse design and query optimization technology based on cloud platform. Firstly, according to the characteristics of highway toll station data, this paper designs a data warehouse for mass highway toll station data. The construction process includes three core operation stages: data extraction, data preprocessing and data processing. Secondly, by comparing the query characteristics of Hive and Impala, this paper analyzes the partition granularity of data warehouse and the business characteristics of highway management, and puts forward three query optimization methods of data warehouse. Then, based on the distributed file storage system HDFS, data warehouse tool Hive and the data query engine Impala, this paper designs and implements the data visualization platform for highway management. Provides data query and project analysis functions. Finally, the function and performance of the data warehouse are verified by the actual toll station data in this paper. The results show that the data query optimization method proposed in this paper can effectively improve the efficiency of data query and shorten the query time.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13;TP393.09
【参考文献】
相关期刊论文 前7条
1 吴黎兵;邱鑫;叶璐瑶;王晓栋;聂雷;;基于Hadoop的SQL查询引擎性能研究[J];华中师范大学学报(自然科学版);2016年02期
2 赵文英;;当前大数据管理技术探究[J];信息与电脑(理论版);2015年22期
3 曾萍;韦杰;;数据仓库技术在高校信息化建设中的应用研究[J];软件;2014年05期
4 李小强;何珊;何金明;;通过对比数据库来理解数据仓库[J];考试周刊;2013年91期
5 邱卫云;;智能交通大数据分析云平台技术[J];中国交通信息化;2013年10期
6 黄文依;王劲松;林胜;;HDFS可视化操作研究与实现[J];天津理工大学学报;2012年01期
7 许春玲;张广泉;;分布式文件系统Hadoop HDFS与传统文件系统Linux FS的比较与分析[J];苏州大学学报(工科版);2010年04期
相关硕士学位论文 前5条
1 张鹏;多数据库环境数据集成与转换技术研究[D];北方工业大学;2016年
2 费仕忆;Hadoop大数据平台与传统数据仓库的协作研究[D];东华大学;2014年
3 王远志;基于Hadoop的全网络流量异常监测算法研究[D];郑州大学;2014年
4 韩欢;基于大数据的智能交通运输平台的研究[D];成都理工大学;2014年
5 常涛;改进型MapReduce框架的研究与设计[D];北京邮电大学;2011年
,本文编号:2397894
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2397894.html