当前位置:主页 > 科技论文 > 测绘论文 >

基于集群的地理空间数据组织与访问方法

发布时间:2018-07-29 12:46
【摘要】:随着大数据时代的到来,地理信息系统领域面对的地理空间数据呈现出了数据量越来越大、数据精度越来越高、计算复杂度越来越大的特点,传统的基于单机的地理信息系统已经难以有效解决数据密集、计算密集的复杂地理计算问题。与此同时,多核、众核、GPU、集群等新型计算硬件架构飞速发展,性能不断提高,成本不断下降,为基于这些架构设计实现高性能地理信息系统平台提供了有效途径。与传统地理信息系统相比,高性能地理信息系统带来了数据处理规模扩大、数据集成度提高、功能协同方便等诸多优势,有利于扩展应用范围,推动大规模地理信息应用的发展。因此,基于新型硬件架构开展高性能地理信息系统领域研究已成为当前该领域的最新发展方向。高性能计算集群是一类典型的高性能计算架构。开展基于高性能计算集群构建高性能地理信息系统的方法研究既具有理论价值,也具有实际应用价值。其中,高性能集群环境下的大规模地理空间数据组织与访问方法是一个重要的研究内容,主要包括地理空间数据模型、存储方式、读写方法和查询检索等内容,目标是实现集群环境下大规模地理空间数据的统一组织管理和高效访问。本文围绕上述目标,研究了高性能计算集群下的地理空间数据模型、栅格数据并行访问方法、矢量数据推荐检索等内容,工作包含以下四个部分:(1)地理空间资源统一组织模型的构建针对高性能计算集群环境下的地理空间资源种类繁多的现状,采用了组合模式,对地理空间资源进行了抽象建模、统一描述,屏蔽了不同类型数据的个性特征,为后续的算法设计打下了基础,实现了对大规模地理空间资源的有效组织。(2)地理栅格数据的并行IO方法针对提高大规模栅格数据IO效率的需求,研究了集群环境下地理栅格数据的并行IO模式,提出了一种地理栅格数据并行IO方法:通过采用基于元数据的分割方法,为每个进程设置独立的写数据视图,对大量的零散写数据进程进行了聚合,改善了大规模栅格数据读写性能,缩短了大规模栅格数据IO时间。(3)空间数据的地名自动标注与检索针对空间数据的数据规模变大后检索准确率降低的情况,本文提出了基于最小外包框的空间数据自动标注算法和基于凸包的矢量数据自动标注算法。通过对空间数据地名信息的提取和标注,算法为大规模空间数据的高效检索提供了方便,为提高空间数据检索的准确率,改善访问性能作出了有益的尝试。(4)地理空间数据管理原型系统的实现在前文的基础上,论文设计实现了一个高性能集群下的地理空间资源管理原型系统,对论文所提出的各种方法进行了测试验证。
[Abstract]:With the arrival of big data era, the geospatial data in the field of geographic information system (GIS) presents more and more large data volume, higher data precision and greater computational complexity. The traditional geographic information system based on single computer has been difficult to solve the complex geographic computing problem which is data intensive and computationally intensive. At the same time, with the rapid development of multi-core, multi-core GPU, cluster and other new computing hardware architecture, the performance is constantly improving and the cost is declining, which provides an effective way for the design and implementation of high-performance GIS platform based on these architectures. Compared with the traditional GIS, high performance GIS brings many advantages, such as the expansion of data processing scale, the improvement of data integration, the convenience of function coordination, and so on, which is beneficial to expand the scope of application. To promote the development of large-scale geographic information applications. Therefore, the research of high performance GIS based on new hardware architecture has become the latest development direction in this field. High performance computing cluster is a typical high performance computing architecture. The method of constructing high performance geographic information system based on high performance computing cluster has both theoretical value and practical application value. Among them, large-scale geospatial data organization and access method in high performance cluster environment is an important research content, including geospatial data model, storage mode, read and write method and query and retrieval, etc. The goal is to realize the unified organization, management and efficient access of large scale geospatial data in cluster environment. In this paper, the geo-spatial data model, the parallel access method of raster data, the vector data recommendation retrieval and so on under the high performance computing cluster are studied. The work includes the following four parts: (1) the construction of the unified organization model of geospatial resources, aiming at the present situation of various kinds of geospatial resources in the high performance computing cluster environment, adopts the combination mode, and carries on the abstract modeling to the geospatial resources. The unified description masked the personality characteristics of different types of data and laid the foundation for the subsequent algorithm design. The efficient organization of large-scale geospatial resources is realized. (2) in order to improve the efficiency of large scale spatial data IO, the parallel IO mode of geographic grid data in cluster environment is studied. This paper presents a parallel IO method for geographic grid data. By using the metadata segmentation method, we set up an independent write data view for each process, and aggregate a large number of scattered write data processes. The performance of large scale raster data reading and writing is improved, and the time of large scale raster data IO is shortened. (3) when the scale of spatial data becomes larger, the retrieval accuracy rate decreases. In this paper, a spatial data automatic tagging algorithm based on minimum outsourced frame and a vector data automatic tagging algorithm based on convex hull are proposed. By extracting and annotating the toponymic information of spatial data, the algorithm provides convenience for efficient retrieval of large-scale spatial data, and improves the accuracy of spatial data retrieval. A beneficial attempt has been made to improve access performance. (4) based on the previous implementation of the Geospatial data Management prototype system, a geo-spatial resource management prototype system based on high performance cluster is designed and implemented in this paper. The methods proposed in this paper are tested and verified.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:P208


本文编号:2152717

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/dizhicehuilunwen/2152717.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f0247***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com