基于云平台的高速公路交通数据仓库设计与查询优化研究与实现

发布时间：2019-01-01 17:38

【摘要】：随着物联网技术的发展,智能化传感器的增多,交通行业收集到的数据急速增长。特别是在高速公路收费系统中,每天都会产生海量的高速公路收费站数据。通过分析这些结构化的数据,可以得到高速公路车流量、载运量时空分布、高速公路运输景气指数、收费报表同比环比等非常有价值的信息,为高速公路管理人员的正确决策提供数据支持。当前,大多数交通部门所使用的管理系统都是使用Oracle驱动的数据库。面对数据体量愈发庞大的高速公路收费站数据,这些管理系统已经出现数据整合过程复杂、时间久、依赖专业人员、数据查询速度慢等问题。因此,本文研究基于云平台的高速公路交通数据仓库设计与查询优化技术。首先,本文针对高速公路收费站数据特点,设计一种面向海量高速公路收费站数据的数据仓库,其构建过程包括数据抽取、数据预处理和数据加工等三个核心操作阶段。其次,本文通过比较Hive和Impala的查询特点,分析数据仓库的分区粒度和高速公路管理的业务特点,提出了三种数据仓库查询优化方法。然后,本文基于分布式文件存储系统HDFS、数据仓库工具Hive和数据查询引擎Impala实现数据仓库构建,设计并实现了面向高速公路管理的数据可视化平台,提供数据查询及专题分析等功能。最后,本文使用实际的高速公路收费站数据验证数据仓库的功能和性能,结果表明本文提出的数据查询优化方法能够有效提高数据查询效率,缩短查询时间。
[Abstract]:With the development of Internet of things technology and the increase of intelligent sensors, the data collected by transportation industry is increasing rapidly. Especially in the freeway toll collection system, a large amount of highway toll collection station data are generated every day. By analyzing these structured data, we can get very valuable information such as freeway traffic flow, space-time distribution of carrying capacity, expressway transportation boom index, toll report forms, and so on. Provide data support for highway managers to make correct decisions. Currently, most management systems used by transportation departments are Oracle-driven databases. Faced with the increasingly large data volume of highway toll station data, these management systems have problems such as complex data integration process, long time, dependence on professionals, slow data query speed and so on. Therefore, this paper studies the highway traffic data warehouse design and query optimization technology based on cloud platform. Firstly, according to the characteristics of highway toll station data, this paper designs a data warehouse for mass highway toll station data. The construction process includes three core operation stages: data extraction, data preprocessing and data processing. Secondly, by comparing the query characteristics of Hive and Impala, this paper analyzes the partition granularity of data warehouse and the business characteristics of highway management, and puts forward three query optimization methods of data warehouse. Then, based on the distributed file storage system HDFS, data warehouse tool Hive and the data query engine Impala, this paper designs and implements the data visualization platform for highway management. Provides data query and project analysis functions. Finally, the function and performance of the data warehouse are verified by the actual toll station data in this paper. The results show that the data query optimization method proposed in this paper can effectively improve the efficiency of data query and shorten the query time.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13;TP393.09

【参考文献】