面向轨迹大数据存储及查询的NoSQL数据库研究

发布时间：2018-06-25 15:10

本文选题：NoSQL + 轨迹大数据　；参考：《华东师范大学》2017年硕士论文

【摘要】：近年来,随着GPS(Global Positioning System)定位技术以及互联网的发展,位置数据的采集方式越来越多样化,使得轨迹数据的数据量不断增加。越来越多的基于位置服务(Location-Based Service)的平台为我们的生产生活提供丰富的服务,这些在线服务平台需要具备较高的数据存取效率,利用传统关系型地理数据库管理海量轨迹数据己不能满足实际需求。本文以船舶航行轨迹数据为研究对象,旨在提出一种使用NoSQL数据库管理大规模轨迹数据的方案,设计轨迹数据的空间索引,提高轨迹时空查询的效率。本文主要从以下三方面研究海量轨迹数据在NoSQL数据库中的存储及轨迹时空查询。(1)在数据库存储架构方面,采用Redis与LevelDB数据库结合的方式管理轨迹数据。利用Redis数据写入效率高而LevelDB读取数据快的特点,将Redis数据库作为数据的前段缓存数据库,实时接收数据并存入内存中,然后再从内存中读取数据并将数据持久化存储在LevelDB数据库中。利用这种存储方案既能满足海量轨迹数据高效存储的需求又能将数据持久化存储在磁盘中,降低数据管理成本。(2)在轨迹数据存储模型设计上,利用键值型数据库灵活的数据结构,采用有序集合形式存储轨迹数据。将数据对象对应的时间点以1h为间隔归并时段,以对象标识符(船号)及归并后的时段作为Key,Value对应存储该时段内该条船的动态信息集合,动态信息以坐标对、地面航行速度、航向、旋转速率等拼接字符串的形式表示。以Unix时间戳作为Score值,数据按照Score值进行排序。(3)在对空间索引的优化方面,采用网格索引,以有序集合存储索引信息。索引设计同时考虑轨迹数据的时空特性。以0.25°的间隔划分经纬网格,以网格左下角经纬度作为网格号,将时间点按照1h的时间间隔归并时段,Key由网格号和归并后的时段组成,Value对应这段时间内出现在该网格内的船号集合,Score值为相应的Unix时间戳。进行时空查询时,根据与查询窗口相交的网格号及查询条件中的时态信息,首先查询索引信息,得到对应的船号集合及相应的时间点信息。再以船号及时段作为查询条件,查询轨迹数据有序集合,得到船的坐标对及相应的时间点的集合,查询结果按照Score值即时间点序列化,进一步得到船舶的运行轨迹。通过与传统对象关系型地理数据库Geodatabase进行数据存储及查询效率的对比,验证了本文提出的数据存储架构能够有效提高数据存取效率,轨迹数据集合存储模型能够有效减小数据冗余,降低数据存储所需空间。包含时空信息的索引有效提高大规模轨迹数据管理及时空查询效率。
[Abstract]:In recent years, with the GPS (Global Positioning System) positioning technology and the development of the Internet, the acquisition of location data is more and more diversified, making the data amount of the trajectory data increasing. More and more platform based services (Location-Based Service) provide a rich service for our production and life, these online services The service platform needs to have high efficiency of data access. Using the traditional relational geo database to manage the mass trajectory data has not met the actual needs. This paper takes the ship navigation trajectory data as the research object, and aims to propose a scheme to manage the large-scale trajectory data using the NoSQL database and design the spatial index of the trajectory data. The efficiency of high trajectory spatio-temporal query. This paper mainly studies the storage and trajectory spatio-temporal query of mass trajectory data in NoSQL database from the following three aspects. (1) in database storage architecture, using the combination of Redis and LevelDB database to manage trajectory data, using Redis data to write high efficiency and LevelDB to read the fast data The Redis database is used as the front cache database of the data, receiving data in real time and storing it in memory, then reading data from memory and storing data persisted in the LevelDB database. This storage scheme can not only satisfy the requirement of efficient storage of mass trajectory data but also store the data persisted in the disk. The cost of low data management. (2) in the design of the trajectory data storage model, using the flexible data structure of the key database and storing the track data in an orderly set form, the time point corresponding to the data object is divided into the interval of 1H as the interval, and the object identifier (ship number) and the time period after the merging are used as Key, and Value corresponds to the storage time period. The dynamic information set of the ship is expressed in the form of a stitching string, such as the coordinate pair, the speed of the ground, the course and the rotation rate. The Unix timestamp is used as the Score value, the data is sorted according to the Score value. (3) the grid index is used to store the index information in an orderly set. The spatial and temporal characteristics of the trajectory data are considered. The longitude and latitude grid is divided at 0.25 degrees, and the latitude and longitude of the lower left corner of the grid is used as the grid number. The time points are classified according to the time interval of the 1H. The Key is composed of the grid number and the time period after the merging. The Value corresponds to the set of ship numbers in the grid in this period, and the Score value is the corresponding Unix. During the time and space inquiry, in the time and space query, according to the grid number intersected with the query window and the temporal information in the query condition, first query the index information, get the corresponding set of ship number and the corresponding time point information. Then the ship number and time period are used as the query conditions to query the orderly collection of the trajectory data and get the coordinates of the ship and the corresponding time points. Set, the query result is serialized according to the Score value of time point, and further gets the running track of the ship. By comparing the data storage and query efficiency with the traditional object relational geo database Geodatabase, it is proved that the data storage architecture proposed in this paper can effectively raise the efficiency of the data access and the memory module of the trajectory data set. The model can effectively reduce data redundancy and reduce the space required for data storage. The index containing spatio-temporal information can effectively improve the management of large-scale trajectory data and the efficiency of spatio-temporal query.
【学位授予单位】：华东师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13

【参考文献】