GML时空聚类及时空序列相似性查询关键问题研究

发布时间：2018-05-07 21:45

本文选题：GML + 时空聚类　；参考：《江西理工大学》2013年硕士论文

【摘要】：随着现代信息技术的飞速发展，GIS作为现代信息技术的重要组成部分，仍然存在着信息数据共享和互操作的问题。这样，使得GIS工作缺乏较好的沟通和交流，为此，OGC推出了GML规范，使得可以在各种GIS数据间架起一座桥梁，实现GIS界的四通八达。 GML（Geography Markup Language，地理标记语言）作为网络环境下的一种地理信息编码规范，随着计算机技术、网络技术、数据库技术的不断发展，已广泛应用于各个领域；随着LBS（Location Based Service，基于位置的服务）市场的扩大，大量的GML时空数据不断涌现，GML在给人们带来便利的同时也产生了一系列的问题，其中最突出的问题是信息过量，信息的利用率不高，对于信息的处理超出了人们的能力。传统的数据挖掘技术面向结构化数据，无法解决变化的、具有层次结构的GML数据，为此，本文着力于研究GML时空聚类的问题。时间和空间关系作为世间万物的基本参照系，使得时空序列数据在现实生活中广泛存在，而且数据呈现“几何式”的增长。这些大量的数据背后蕴藏着众多具有参考价值的信息。如何从海量的时空数据中提取知识，分析其结果，，给决策者提供有用建议，已经成为目前空间数据挖掘亟待解决的问题。目前GML时空序列相似性查询的研究还很有价值空间，特别是针对海量的GML数据。针对GML时空聚类和时空序列相似性查询的当前研究状况，本文做了主要以下几个方面的研究工作：（1）、详细的阐述介绍了GML时空数据的模型。对时空数据的多种模型分析方式做了描述，并针对海量数据的存储阐述了基于HBase的GML时空数据模型。（2）、对GML时空聚类的算法进行了研究，阐述了经典的聚类算法（划分方法、层次方法、基于密度的算法、基于网格的算法、基于模型的算法），并在经典算法的基础之上提出了基于空间邻近关系的K-均值聚类算法和基于空间邻域的GML时空聚类算法，分别的对相应算法进行的实验的验证，对空间邻近关系的K-均值聚类算法进行了区域经济发展空间相关性验证、区域经济发展空间聚类分析和区域经济发展时空聚类分析等。（3）、对GML时空序列相似性查询的研究做了深入的研究，特别是基于空间邻近关系的GML时间序列相似性查询的研究，采用了我国大陆31个省直辖市1997年～2012年共16年的国民经济统计数据，分别对GDP1per、GDP2per和GDP3per在相似性度量计算之前要进行标准化处理，分析反映区域经济发展水平，反映区域三大产业的结构。
[Abstract]:With the rapid development of modern information technology GIS as an important part of modern information technology still exists the problem of information data sharing and interoperability. In this way, GIS lacks good communication and communication. Therefore, GML specification is introduced, which can build a bridge between all kinds of GIS data and realize the connection of GIS boundary. GML(Geography Markup language (GIS) is a kind of geographic information coding standard under the network environment. With the development of computer technology, network technology and database technology, it has been widely used in various fields. With the expansion of the LBS(Location Based Service (location-based service) market, a large number of GML spatio-temporal data are emerging constantly, which bring convenience to people, but also produce a series of problems, among which the most prominent problem is information overdose, and the utilization rate of information is not high. The processing of information is beyond people's ability. The traditional data mining technology is oriented to structured data, and can not solve the problem of changing and hierarchical GML data. Therefore, this paper focuses on the problem of GML spatio-temporal clustering. As the basic frame of reference of everything in the world, the relationship between time and space makes the space-time series data widely exist in the real life, and the data presents the growth of "geometry". The large amount of data contains a lot of information with reference value. How to extract knowledge from massive spatio-temporal data, analyze its results and provide useful advice to decision makers has become an urgent problem in spatial data mining. At present, the research of GML spatiotemporal sequence similarity query is valuable, especially for massive GML data. In view of the current research situation of GML spatio-temporal clustering and spatio-temporal sequence similarity query, this paper has done the following research work: This paper introduces the model of GML spatiotemporal data in detail. In this paper, several models of spatiotemporal data are described, and the GML spatio-temporal data model based on HBase is described for the storage of massive data. In this paper, the algorithms of GML spatio-temporal clustering are studied, and the classical clustering algorithms (partitioning method, hierarchical method, density-based algorithm, grid-based algorithm) are expounded. Based on the model algorithm, and based on the classical algorithm, the paper proposes the K-means clustering algorithm based on the spatial proximity relationship and the GML space-time clustering algorithm based on the spatial neighborhood, respectively, and verifies the corresponding algorithm by experiments. The spatial correlation of regional economic development, spatial cluster analysis of regional economic development and spatial-temporal cluster analysis of regional economic development are verified by K-means clustering algorithm of spatial proximity relationship. This paper makes a deep research on the similarity query of GML time series, especially on the similarity query of GML time series based on spatial proximity. The statistical data of the national economy of 31 provinces and municipalities in mainland China from 1997 to 2012 are used to standardize the GDP1pern GDP2per and GDP3per before the calculation of similarity measurement, and to analyze and reflect the level of regional economic development. Reflect the structure of the three major industries in the region.
【学位授予单位】：江西理工大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：P208

【参考文献】