我国林地“一张图”数据存储模型与查询分析并行计算技术研究
发布时间:2018-01-11 04:21
本文关键词:我国林地“一张图”数据存储模型与查询分析并行计算技术研究 出处:《中国林业科学研究院》2016年博士论文 论文类型:学位论文
更多相关文章: 林业GIS 分布式GIS 分布式空间数据库 分布式空间索引 分布式空间分析算法 分布式任务调度
【摘要】:林地资源数据反映了全国林地的现状和变化情况,是林业部门和相关企业进行管理和综合决策的重要依据。全国林地“一张图”系统从开始建设至今,已经拥有包括遥感影像、落界数据、林地变化数据、DEM(Digital Elevation Model)数据等,这些数据经过预处理后约33TB,成为我国林业有史以来最大的空间数据库。随着调查的进一步展开和应用类型的丰富,全国林地“一张图”所容纳的数据量越来越大,数据类型也越来越多。面对如此大量的数据,现有的管理方式在效率、可用性和可扩展性上存在的问题越来越突出,目前的研究中也没有合适的整体方案来解决现存问题。在此背景下,本文对大规模的空间数据在分布式系统中的组织方式和查询、分析手段进行深入探讨和研究。本文深入分析了传统GIS架构和已有分布式GIS研究中在部署和运行中存在的问题,结合全国林地“一张图”的数据特点,设计了适用于分布式空间数据存储、空间查询和空间分析的系统架构,对其中使用的主要技术进行详细阐述,并实现了原型系统对相关技术进行验证。验证结果表明原型系统的空间查询、空间分析和并发空间访问都有较高的效率,能够满足对全国林地“一张图”系统对查询时间的需求。本文所做研究工作如下:(1)分析了全国林地“一张图”系统的数据内涵和应用需求,在理论上建立了全国林地“一张图”系统的分布式架构,并提出了架构中的三个核心问题:分布式空间数据存储模型、分布式空间查询与空间分析算法、分布式空间运算任务调度;(2)分布式空间存储模型研究:通过设计键值数据在HDFS(Hadoop Distributed File System)中的组织结构、基于内存的分布式数据库架构、基于分布式数据库的空间数据组织结构、基于散列码的分布式空间索引,实现了空间数据在分布式架构的存储模型,避免了已有研究中存储分布式空间数据时对空间关系的破坏。测试结果表明这种存储模型能将空间查询速度提升到传统方式的17-70倍;(3)分布式空间查询与空间分析算法:使用Hadoop的MapReduce分布式运算框架实现了基于MapReduce的分布式空间分析的基础逻辑,并实现了一些典型空间分析的具体算法。测试结果表明这种方法能够减少复杂空间分析对系统性能的需求,并在运算量较大时能够极大提升空间分析的效率;(4)分布式空间运算任务调度算法:基于用户运算量最低配额的思想设计分布式空间运算任务的调度算法,保证空间运算任务的基本运算量,并尽可能将运算任务分配到数据所在的运算节点。测试证明该算法比MapReduce的默认算法平均响应时间提升了35-40%,任务平均耗时提高了15%-20%,运算过程中本地数据的运算任务百分比提高了5%-10%。本文的创新点如下:(1)设计了能够满足空间数据分布式存储和分布式空间查询与空间分析需求的系统架构;(2)设计了空间数据在分布式文件系统中的物理存储模型、逻辑存储模型和分布式空间索引;(3)设计了空间查询与空间分析在分布式运算框架中的基本逻辑和一些典型分布式空间分析算法;(4)设计了空间运算任务在分布式系统架构中的调度流程。
[Abstract]:Forest resources data reflects the status and changes of the national forest, is an important basis for the forestry department and the relevant enterprise management and integrated decision-making. The national forest "one map" system since the construction, has included the remote sensing image, down bound data, woodland change data, DEM (Digital Elevation Model) data. These data, after pretreatment of about 33TB, to become the largest spatial database of China's forestry history. As the investigation and further expand the application type rich, the amount of data and the national forest "one map" to accommodate the larger, more data types. In the face of such a large amount of data, the existing management in the way of efficiency, availability and scalability issues are increasingly prominent, the current study has no overall plan suitable to solve the existing problems. Under this background, this paper. The spatial data model in the distributed system organization and query, in-depth discussion and analysis means. This paper deeply analyzes the existing in the deployment and operation of the traditional GIS architecture and the existing problems in the distributed GIS data, combined with the characteristics of the national forest "a map", is designed for distributed spatial data storage system architecture, spatial query and spatial analysis, the main use of the technology in detail, and implement a prototype system to verify the relevant technology. The verification results show that the prototype system of spatial query, spatial analysis and spatial access concurrency have higher efficiency, can satisfy the demand of the national forest "query time map system. The research work in this paper are as follows: (1) analysis of the national forest" one map "system data connotation and application requirements, established in theory The distributed architecture of national forest "one map" system, and puts forward three key problems in the architecture of distributed spatial data storage model, distributed spatial query and spatial analysis algorithm, distributed spatial computing task scheduling; (2) the research of distributed spatial storage model: through the design of key data in HDFS (Hadoop Distributed File System) in the structure, the memory architecture based on distributed database, spatial data organization structure based on distributed database, distributed spatial index based on the hash code, realize the storage model of spatial data in a distributed architecture, to avoid damage to the space between the storage space in the studies of distributed data. Test results show that this model can store the spatial query speed up to 17-70 times the traditional way; (3) distributed spatial query and spatial analysis algorithm using Hadoop MapR Educe distributed computing framework based on the realization of distributed spatial logic analysis based on MapReduce, and realizes the specific algorithm analysis of some typical space. The test results show that this method can analyze the performance of the system needs to reduce the complexity of space, and can greatly enhance the efficiency of space in a large amount of computation; (4) the spatial distributed computing tasks scheduling algorithm: user scheduling algorithm is the lowest amount of computation quota design based on distributed spatial computing tasks, ensure the basic computation space computing tasks, and as far as possible be operational tasks assigned to the nodes where the data is located. The test proved that the algorithm is better than the default algorithm MapReduce the average response time increased by 35-40%, the average time to improve the task 15%-20%, the percentage of local data processing tasks in the operation process to improve the innovation of this paper is as follows: 5%-10%. (1). The system can meet the demand structure of spatial data distributed storage and distributed spatial query and spatial analysis; (2) the design of the space physical storage of data in a distributed file system model, logical storage model and distributed spatial index; (3) the design of spatial query and spatial analysis of the basic logic in the distributed computing framework and some typical distributed spatial analysis algorithm; (4) design the scheduling process of spatial operations tasks in a distributed system architecture.
【学位授予单位】:中国林业科学研究院
【学位级别】:博士
【学位授予年份】:2016
【分类号】:S757
【参考文献】
相关期刊论文 前6条
1 王结臣;王豹;胡玮;张辉;;并行空间分析算法研究进展及评述[J];地理与地理信息科学;2011年06期
2 易侃;王汝传;;分布式任务调度与副本复制集成策略研究[J];通信学报;2010年09期
3 卢照;师军;;并行最短路径搜索算法的设计与实现[J];计算机工程与应用;2010年03期
4 宁利国,孙成良;GIS在林业上应用的发展概况[J];林业勘查设计;2005年02期
5 罗红,慕德俊,邓智群,王晓东;网格计算中任务调度研究综述[J];计算机应用研究;2005年05期
6 张会儒;计算机技术在国外林业中应用的现状及发展趋向[J];世界林业研究;1998年05期
相关博士学位论文 前1条
1 李惺颖;林地落界数据快速查询技术研究[D];中国林业科学研究院;2014年
相关硕士学位论文 前3条
1 魏炎炎;异构Hadoop平台性能分析及其调度算法优化研究[D];合肥工业大学;2013年
2 段安利;空间拓扑分析操作的并行处理技术研究[D];南京航空航天大学;2009年
3 张丽丽;支持空间分析的并行算法的研究与实现[D];南京航空航天大学;2008年
,本文编号:1408106
本文链接:https://www.wllwen.com/shoufeilunwen/nykjbs/1408106.html