当前位置:主页 > 科技论文 > 计算机论文 >

基于HBase的卫星空间数据查询系统设计与性能分析

发布时间:2018-11-04 11:40
【摘要】:随着航天技术与信息技术的融合,数据感知和采集范围得到极大的扩展,卫星空间数据资源的储备急速提升。卫星空间数据的4V(Volum、 Variety、Value、Velocity)特性,使得传统SQL型数据库因扩展性、并行性方面的限制,其存储及操作技术难以满足卫星空间数据的分析需求。而近年来快速发展的基于可并行计算和可扩展存储的Hadoop和HBase技术,为解决海量数据的存储和查询提供了一种有效途径。如果缺乏系统的良好总体设计,卫星数据的时间特性及HBase中Rowkey的字典序排序特点,在实际应用中数据存入系统时容易造成系统的热点问题,影响系统的负载均衡及存储查询性能;同时随着导入系统的空间数据量增加,将促使系统Region不断地分裂和合并,对系统的写性能造成影响。此外,针对多维空间数据的范围查询,HBase基于列的查询需要进行全表扫描,导致查询效率低下,难以满足系统实际的查询需求。故针对以上问题分别从存储和查询两方面进行系统设计。在存储方面,提出了空间数据散列设计和系统预分区方案,有效地避免了系统的热点问题,实现了系统的负载均衡,同时,提高了系统的写性能;在查询方面,提出了一种GKD-HBase索引模型,结合了Grid和KD树两种索引方法,分别将二者作为第一和第二索引;使用Hilbert空间填充曲线对多维数据进行降维处理,将其转化为一维数据进行查询,从而有效提高系统的查询效率。最后,对本文设计的查询系统在存储和查询两方面进行性能测试分析。结果表明,对海量空间数据进行Rowkey散列和对系统进行预分区设计能有效避免系统集群的热点问题,使系统达到负载均衡,并得出当Region大小为7G时候,系统写性能达到最优的结论。在大数据环境下,本文提出的GKD-HBase索引能够高效进行海量多维空间数据的范围查询,与Grid索引相比具有显著性能优势,并为基于HBase卫星空间数据查询的实际应用提供有力支撑。对卫星空间数据的查询结果进行关联分析可挖掘出大量潜在的海上或空中目标信息(如通过航空数据对海上及空中目标进行识别和追踪,而目标的识别和追踪又涉及对海量卫星空间数据的实时存储和快速查询问题)。而本系统的存储和查询设计能有效提高系统的存储性能和查询性能,具有一定的实际应用价值。
[Abstract]:With the integration of space technology and information technology, the range of data perception and acquisition has been greatly expanded, and the reserve of satellite space data resources has been rapidly increased. Because of the 4V (Volum, Variety,Value,Velocity) characteristic of the satellite spatial data, the traditional SQL database is limited in scalability and parallelism, and its storage and operation technology is difficult to meet the analysis needs of the satellite spatial data. In recent years, the rapid development of Hadoop and HBase technology based on parallel computing and extensible storage provides an effective way to solve the problem of massive data storage and query. If there is no good overall design of the system, the time characteristics of the satellite data and the dictionary ordering characteristics of the Rowkey in HBase, it is easy to cause the hot problems of the system when the data is stored in the system in practical application. Affect the load balance and storage query performance of the system; At the same time, with the increase of the spatial data volume, the system Region will be split and merged, which will affect the write performance of the system. In addition, for the range query of multidimensional spatial data, the query based on HBase column needs to scan the whole table, which leads to the inefficiency of the query, and it is difficult to meet the actual query requirements of the system. Therefore, to solve the above problems, the system is designed from two aspects of storage and query. In the aspect of storage, the spatial data hash design and system pre-partitioning scheme are put forward, which effectively avoid the hot spot of the system, realize the load balance of the system, and improve the writing performance of the system. In the aspect of query, a GKD-HBase index model is proposed, which combines Grid and KD tree as the first index and the second index. The dimension of multidimensional data is reduced by filling curve of Hilbert space, which is transformed into one-dimensional data to query, thus improving the query efficiency of the system effectively. Finally, the performance of the query system designed in this paper is tested and analyzed in storage and query. The results show that Rowkey hashing of massive spatial data and pre-partitioning design of the system can effectively avoid the hot issues of the system cluster, make the system achieve load balance, and obtain that when the Region size is 7G, The conclusion that the writing performance of the system is optimal. In the environment of big data the GKD-HBase index proposed in this paper can efficiently query the range of massive multidimensional spatial data and has significant performance advantages compared with Grid index. It also provides strong support for the practical application of spatial data query based on HBase satellite. The association analysis of the query results of satellite spatial data can extract a large amount of potential marine or aerial target information (such as identifying and tracking maritime and aerial targets through aviation data). Target recognition and tracking involve real-time storage and fast query of massive satellite spatial data. The storage and query design of the system can effectively improve the storage performance and query performance of the system.
【学位授予单位】:北京化工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP333;TP311.13

【参考文献】

相关期刊论文 前4条

1 何婧;吴跃;杨帆;尹春雷;周维;;基于KD树和R树的多维云数据索引[J];计算机应用;2014年11期

2 丁飞;陈长松;张涛;杨涛;张岩峰;;基于协处理器的HBase区域级第二索引研究与实现[J];计算机应用;2014年S1期

3 徐红波;郝忠孝;;基于空间填充曲线网格划分的最近邻查询算法[J];计算机科学;2010年01期

4 邱永红;曾永年;邹滨;;KDT树:一种多维空间数据索引结构[J];计算机工程与应用;2009年08期



本文编号:2309720

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2309720.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户11fc0***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com