空间信息服务云存储与管理机制的初步研究
本文选题:空间信息服务云 + 数据区别冗余 ; 参考:《成都理工大学》2012年硕士论文
【摘要】:所谓云服务,是指利用计算机硬件技术、软件技术、信息安全技术、网络技术、空间信息技术、通信技术、虚拟化技术、集群技术和存储技术以及并行计算等技术,将大量分布在网络中的各种资源联合起来进行统一管理和调度,构成一个庞大的资源池以按需、易扩展的方式向用户提供服务。云服务有很多分类,根据云的服务方式不同,可以将其划分为公有云、私有云和混合云三类。按照体系结构可以划分为SaaS、PaaS和IaaS三种模型。除此之外,根据用户的使用可以将云服务划分为文件云、设备云和应用云三类。 目前市面上云服务很多,但是与空间信息相结合的云服务很少。空间信息是指用来描述空间地理位置、空间实体分布特征及时间空间特性的信息。据不完全统计,在人类能获取的信息中,有超过80%的信息是与空间信息相关的信息。空间信息数据通常具有数据量大、种类繁多、结构复杂、专业性强、来源广泛和实时性强等特点。目前对于空间信息数据的存储和管理方面存在不足,缺乏对空间信息数据的快速感知能力,缺乏对海量空间信息数据的存储和管理机制,缺乏对空间信息数据的备份和灾难恢复机制,因此,提供空间信息服务云具有重大的意义。 本文根据云服务现状,以G/S模式的理论体系、系统架构为理论依据,以相关课题和项目为依托,分析了空间信息数据的特点和空间信息服务云的的特点和需求,对比分析了主流的存储技术、并行计算技术和分布式计算技术,以当今既稳定又流行的开源分布式文件系统HDFS为基础,以分布式计算框架MapReduce为编程模型,研究了空间信息服务云的整体架构和内部的数据存储管理工作机制。本文主要取得了如下的成果: (1)研究了空间信息数据的特点,总结起来主要表现在海量、多源、异构和时空属性四个方面。 (2)研究了空间信息服务云的具体需求,并结合这些需求对比研究了各种存储技术,包括RAID技术、FastDFS文件系统、MooseFS文件系统以及HDFS的架构特点和应用领域,对比了并行计算技术MPI和分布式计算框架MapReduce,选择了适合空间信息数据特点,满足空间信息服务云需求的HDFS文件系统和MapReduce编程框架为基础架构和编程模型。 (3)研究了搭建空间信息服务云平台过程中预防主节点单点失效应该注意的问题和具体的配置方案。 (4)根据用户访问系统数据的频度,,通过增加系统日志文件的方法,编写了MapReduce应用程序去处理数据区别冗余问题,既增加了系统的实用性又节约了系统的存储成本和管理资源。 (5)提出了一种具有空间信息数据特色的数据更新机制,是一种保留历史数据的增量更新机制。 (6)最后对整个系统进行了相应的测试,理论结合实际,实验验证理论,便于通过具体的实验数据和实验方法来改进理论。 并在此基础上做了以下创新: (1)提出了一种根据文件访问频度对数据进行区别冗余的机制。数据区别冗余是指根据数据的访问频度动态地去更改文件的冗余数,以求获得更好的文件访问性能和最省的磁盘空间。 (2)提出了一种保留历史数据的空间信息数据增量更新机制。数据增量更新机制是根据空间信息数据特点设计的,用户可以通过对比分析历史数据和最新数据了解空间特征的变化过程。
[Abstract]:The so-called cloud service refers to the use of computer hardware technology, software technology, information security technology, network technology, space information technology, communication technology, virtual technology, cluster technology and storage technology and parallel computing. It combines a large number of resources distributed in the network to unite management and scheduling, and form a large scale. The resource pool provides service to the user in the way of demand and extensibility. There are many categories of cloud services. According to the different service modes of the cloud, it can be divided into three categories: public cloud, private cloud and mixed cloud. According to architecture, it can be divided into three models, SaaS, PaaS and IaaS. In addition, the cloud service can be divided according to user's use. For file cloud, device cloud and application cloud three.
At present, there are many cloud services on the market, but there are few cloud services combined with spatial information. Spatial information refers to information used to describe the spatial geographic location, spatial entity distribution and time and space characteristics. According to incomplete statistics, more than 80% of information is related to spatial information in the information obtained by human beings. Space information is a spatial information. Interest data usually has the characteristics of large amount of data, complex structure, complex structure, strong specialization, wide source and strong real time. There is a shortage of storage and management of spatial information data, lack of fast perception of spatial information data, lack of storage and management mechanism for massive spatial information data, and lack of space for space information data. Information data backup and disaster recovery mechanism. Therefore, it is of great significance to provide cloud services for spatial information.
Based on the current situation of cloud service, this paper analyzes the characteristics of spatial information data and the characteristics and requirements of space information service cloud based on the theoretical system of G/S model and system architecture, based on related topics and projects, and compares and analyzes the mainstream storage technology, parallel computing technology and distributed computing technology, which is stable today. Based on the popular open source distributed file system (HDFS), based on the distributed computing framework MapReduce as a programming model, the overall architecture of space information service cloud and the internal data storage management mechanism are studied.
(1) the characteristics of spatial information data are studied and summarized in four aspects: massive, multi-source, heterogeneous and spatio-temporal attributes.
(2) the specific requirements of space information service cloud are studied, and a variety of storage technologies, including RAID technology, FastDFS file system, MooseFS file system and HDFS, are studied and compared with these requirements, and the parallel computing technology MPI and distributed computing framework MapReduce are compared, and the space information data are selected to fit the space information data. Features: HDFS file system and MapReduce programming framework that meet the cloud needs of spatial information services.
(3) the problems that should be paid attention to during the process of building the cloud service platform for space information service and the specific configuration scheme should be paid attention to.
(4) according to the frequency of user access to the system data and the method of increasing the system log file, the MapReduce application is written to deal with the problem of data difference redundancy, which not only increases the practicability of the system, but also saves the storage cost and management resources of the system.
(5) a data update mechanism with spatial information data characteristics is proposed, which is an incremental updating mechanism for preserving historical data.
(6) at last, the whole system is tested, the theory is combined with the practice, the theory is verified by experiment, and the theory is improved by the concrete experimental data and the experimental method.
And on this basis, the following innovations have been made:
(1) a mechanism of redundancy based on the frequency of file access is proposed. Data difference redundancy refers to dynamically changing the redundant number of files according to the frequency of access to the data in order to obtain better file access performance and the most provincial disk space.
(2) an incremental updating mechanism of spatial information data is proposed. The incremental updating mechanism of the data is designed according to the characteristics of spatial information data. The user can understand the change process of spatial characteristics by comparing and analyzing the historical data and the latest data.
【学位授予单位】:成都理工大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP333
【参考文献】
相关期刊论文 前2条
1 刘岳峰;地理信息服务概述[J];地理信息世界;2004年06期
2 孙庆辉;王家耀;钟大伟;李少梅;;空间信息服务模式研究[J];武汉大学学报(信息科学版);2009年03期
相关博士学位论文 前3条
1 王兴玲;基于XML的地理信息Web服务研究[D];中国科学院研究生院(遥感应用研究所);2002年
2 俞晓;空间信息网络访问模式——G/S模式研究[D];成都理工大学;2009年
3 郭曦榕;基于G/S模式的数字旅游工程及其评估技术研究[D];成都理工大学;2010年
相关硕士学位论文 前10条
1 杜勇;基于HDFS的云数据备份系统的设计与实现[D];吉林大学;2011年
2 李波;基于Hadoop的海量图象数据管理[D];华东师范大学;2011年
3 张洪娜;云计算平台中数据存储与文件管理的研究[D];广东工业大学;2011年
4 杨丽婷;基于云计算数据存储技术的研究[D];中北大学;2011年
5 陈剑锐;基于Hadoop海量数据存储仿真平台的研究与设计[D];华南理工大学;2011年
6 吴昊;基于HDFS的分布式文件系统数据冗余技术研究[D];西安电子科技大学;2011年
7 泰冬雪;基于Hadoop的海量小文件处理方法的研究[D];辽宁大学;2011年
8 杨勇;基于DFS的构建服务器集群技术的研究与实现[D];成都理工大学;2011年
9 朱珠;基于Hadoop的海量数据处理模型研究和应用[D];北京邮电大学;2008年
10 张容;LVS负载均衡技术在G/S分布式集群中的应用[D];成都理工大学;2009年
本文编号:1816235
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1816235.html