云环境下顾及空间子域分布特征的空间大数据并行计算方法研究
本文关键词: 空间大数据 空间子域 邻域空间操作 空间划分 出处:《浙江大学》2017年博士论文 论文类型:学位论文
【摘要】:空天地立体观测与移动互联网技术的蓬勃发展带来爆炸式增长的空间大数据,迫使空间分析计算模式从集中式处理、单人机交互向高扩展性、高效性、数据多源性方向转变。利用云计算资源实现空间大数据并行化处理是完成这一模式转变的重要途径。云环境下的并行计算范式本质上是一种单指令多数据流并行,该范式要求将数据集划分成独立的无共享的部分并行处理。然而,空间数据具有组织异构、分布不均衡、实体关联性强等特点,这导致空间数据无法直接分割以适应云环境下的并行计算范式,传统的并行空间计算方法大多面向特定的应用场景,缺乏对空间实体关联关系及分布特征的考虑,未能形成包括空间大数据组织存储、划分计算、效率优化等在内的并行计算方法体系。针对以上问题,本文开展了顾及空间子域分布特征的空间大数据并行计算方法体系研究,提出了不同空间子域分布特征的空间操作数据划分策略与并行化方法,以实际空间计算场景为例,采用真实的空间大数据集对本文提出的方法进行了正确性与高效性验证,为云环境下亿级空间大数据计算提供了方法支撑与案例借鉴。本文的研究内容概括如下:(1)遵循云环境下并行计算范式要求,设计了空间数据云存储组织与并行空间计算统一流程表达方法,研究了面向数据划分的空操作分类及其空间子域分布特征,提出了空间子域的任务计算量评估方法,形成完整的并行空间计算方法体系。(2)在上述基础上,针对本地空间操作的特性,设计了基于默认子域和基于格网子域的两种通用本地空间操作数据划分方法,基于这两种方法实现了空间频率图和多级金字塔矢量图的并行绘制,以十亿级全球兴趣点与百万级矢量多边形为测试数据对比验证了方法的适用性与高效性。(3)提出了邻域空间操作的三种规则空间子域分布形态:范围分布的规则空间子域、范围时空分布的规则空间子域、异构数据叠加误差导致的规则空间子域,对上述三种子域分布特性的空间操作分别设计了并行化方法,分别以空间距离连接、时空热点分析、大规模三维地表表面积计算为案例验证了方法的适用性与高效性。(4)针对空间子域不规则分布的邻域空间操作的特点,以K邻近连接为例,提出了基于格网均匀扩张的不规则空间子域范围确定方法和基于Voronoi的不规则空间子域范围确定方法,在此基础上实现了 K邻近连接的并行化算法,通过性能实验对比了两种方法的适用性与高效性。
[Abstract]:The booming development of space space stereoscopic observation and mobile Internet technology has brought explosive growth of space big data, forcing the spatial analysis and calculation model from centralized processing, single man-machine interaction to high scalability and efficiency. Using cloud computing resources to realize spatial big data parallelization is an important way to accomplish this pattern transformation. The parallel computing paradigm in cloud environment is essentially a single instruction multi-data flow parallelism. However, spatial data has the characteristics of heterogeneous organization, uneven distribution, strong entity correlation and so on. As a result, spatial data can not be partitioned directly to adapt to the parallel computing paradigm in the cloud environment. Most of the traditional parallel spatial computing methods are oriented to specific application scenarios, and lack of consideration of spatial entity association and distribution features. The parallel computing method system including spatial big data organization storage, partition calculation, efficiency optimization and so on has not been formed. In view of the above problems, this paper has carried out the research on the spatial big data parallel computing method system, which takes into account the spatial subdomain distribution characteristics. In this paper, the partition strategy and parallelization method of spatial operation data with different spatial subdomain distribution characteristics are proposed. Taking the actual spatial computing scene as an example, the correctness and efficiency of the proposed method are verified by using the real spatial big data set. It provides method support and case reference for big data computing in the cloud environment. The research contents of this paper are summarized as follows: 1) following the requirements of parallel computing paradigm in cloud environment. The unified flow representation method of cloud storage organization and parallel spatial computing for spatial data is designed. The spatial operation classification and its spatial subdomain distribution characteristics are studied, and the task computation evaluation method based on spatial subdomain is proposed. On the basis of the above, two general local spatial operation data partitioning methods based on default subdomain and grid subdomain are designed according to the characteristics of local space operation. Based on these two methods, the parallel rendering of spatial frequency map and multi-level pyramid vector graph is realized. The applicability and efficiency of the method are verified by comparing the 1 billion level global interest points with the million-level vector polygons. (3) three kinds of regular space subdomain distributions of neighborhood space operations are proposed: the regular space subdomains of the range distribution. The rule space subdomain of the scope space-time distribution and the rule space subdomain caused by the error of heterogeneous data superposition are designed to parallelize the spatial operations of the distribution characteristics of the above three subdomains, respectively, which are connected by space distance and analyzed by space-time hot spots. Large scale 3D surface area calculation is a case study to verify the applicability and efficiency of the method. Aiming at the characteristics of the spatial operation of the irregular distribution of the spatial subdomain, the paper takes K neighborhood connection as an example. In this paper, a method of determining the subdomain range of irregular space based on uniform expansion of grid and a method of determining subdomain range of irregular space based on Voronoi are proposed. On this basis, the parallel algorithm of K-adjacent connection is implemented. The applicability and efficiency of the two methods are compared by performance experiments.
【学位授予单位】:浙江大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:TP311.13;P208
【相似文献】
相关期刊论文 前10条
1 樊苗;;双极值模糊软子域和余双极值模糊软子域[J];西北师范大学学报(自然科学版);2013年05期
2 钟万勰;子域精细积分及偏微分方程数值解[J];计算结构力学及其应用;1995年03期
3 蔡志勤,,钟万勰;子域精细积分的稳定性分析[J];水动力学研究与进展(A辑);1995年06期
4 姚克仁;关于域的一个性质[J];浙江农村技术师专学报;1990年01期
5 赖永星;刘敏珊;董其伍;;单点子域积分与多点子域积分[J];计算力学学报;2006年03期
6 赖永星;刘敏珊;董其伍;;多点子域积分及计算格式研究[J];机械强度;2006年06期
7 陈顺良;钟时猷;潘长良;;有限子域非均质问题的边界元解[J];中南矿冶学院学报;1990年02期
8 赖永星,王伟,张汴生;单点子域积分解结构的动位移响应[J];机械强度;1998年01期
9 傅亚群,马启民;子域康托洛维奇法及其应用[J];阜新矿业学院学报;1987年02期
10 潘存鸿,黄菊卿;河口、港湾潮流数值模拟中的区域分裂法[J];东海海洋;1990年01期
相关会议论文 前3条
1 赵青;胡影;戴方芳;;一种基于逻辑子域的大规模网络攻击图生成方法[A];2013年中国信息通信研究新进展论文集[C];2014年
2 肖锋;吴燕冈;孟令顺;;小子域滤波中小子域剖分方式的改进[A];中国地球物理·2009[C];2009年
3 郑勇刚;高菲;张洪武;卢梦凯;;饱和多孔介质大变形耦合动力接触分析的修正对流粒子域插值物质点方法[A];中国力学大会——2013论文摘要集[C];2013年
相关重要报纸文章 前1条
1 买天;互联网域名面面观[N];中国乡镇企业报;2000年
相关博士学位论文 前1条
1 赵贤威;云环境下顾及空间子域分布特征的空间大数据并行计算方法研究[D];浙江大学;2017年
本文编号:1524504
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1524504.html