基于NoSQL数据库的海量天文图像分布存储研究
发布时间:2019-04-23 14:42
【摘要】:随着计算机技术和网络技术的迅猛发展,软硬件的不断更新换代,现今数据成指数爆炸式增长趋势。如此庞大的数据我们称之为海量数据,甚至是大数据。这标志这大数据时代的到来。跟以往数据不同的是越来越多的数据属于非结构化数据,例如声音、图片以及视频等类型文件。在天文领域,随着天文观测设备和终端设备的设计与制造技术不断提高,各地观测站的不断建立以及规模的不断扩大,天文观测能力成倍增长,已从古老的光学观测变成全波段的天文学研究。天文数据每小时甚至每秒都在以惊人速度增加。天文领域面临着海量数据存储的挑战。 面对天文海量数据存储的要求,传统关系型数据库已不是解决问题的理想方案,它的固有特性甚至成为海量数据存储的局限。而云计算、云存储全新的存储和计算思想给IT领域带来新的变革。本文就是基于这种形势研究云存储平台NoSQL数据库在天文海量图片存储中的应用前景。 本论文采用MongoDB对云存储技术以及NoSQL数据库在天文领域的应用做了深入研究。 首先,基础理论调研。 其次,研究基于MongoDB的海量数据存储系统的构建与关键技术实现。 再次,在海量天文数据存储系统上做实验分析。本部分采用四组实验展开研究,通过存储大量天文数据FITS文件得到实验数据,然后进行对比分析,最后得出了以下结论。第一,在NoSQL数据库这样的分布式存储中,分片能很大程度上提升数据存储和检索性能。第二,不同的分片大小也会影响存储和检索性能,找到最佳分片大小对于分布式存储至关重要,对于4M的FITS文件,所选取的分片中分片大小取值为512K的时候,存储效率最高。第三,像MongoDB这样的内存映射存储数据库,在存储和检索数据时都会出现一定阻塞,实验证明阻塞情况跟分片没有明显关系。第四,存储不同文件大小,最佳分片大小的取值不一样,在实验选取的七组分片中,存储小于16M的FITS文件时,最佳分片大小和文件大小之间的比例是1:8,而大于或等于16M的FITS文件,最佳分片大小不会随着文件增大而增大,基本在1M时存储效率最高。此外,本研究在仅使用两台普通服务器等条件下,通过对实验数据的分析,存取速度能达到80M/s,如果再改善集群条件(如高内存、高带宽、多网卡、多数据节点等),存储的容量和速度都会有很大程度的提升,这样就能实现海量天文数据的高效存储。而云存储就是这样一个能整合网络存储资源以及实现多节点的平台,从而可推断云存储是海量天文数据存储的必然趋势。 最后,总结研究工作,得出研究结论并提出未来的展望。
[Abstract]:With the rapid development of computer technology and network technology, and the updating of software and hardware, the data is increasing exponentially. Such a huge amount of data we call massive data, even big data. This marks the arrival of the big data era. Unlike previous data, more and more data belong to unstructured data, such as sound, picture and video files. In the field of astronomy, with the continuous improvement of the design and manufacture technology of astronomical observation equipment and terminal equipment, the continuous establishment of local observation stations and the continuous expansion of the scale, the astronomical observation capability has doubled. It has changed from ancient optical observations to full-band astronomical research. Astronomical data are increasing at an alarming rate, even per second. Astronomical field faces the challenge of massive data storage. Facing the requirement of astronomical mass data storage, the traditional relational database is no longer the ideal solution to solve the problem, and its inherent characteristics even become the limitation of mass data storage. Cloud computing, cloud storage new storage and computing ideas to bring new changes in the field of IT. Based on this situation, this paper studies the application prospect of cloud storage platform NoSQL database in astronomical massive picture storage. In this paper, the cloud storage technology and the application of NoSQL database in astronomical field are deeply studied by MongoDB. First of all, basic theory investigation. Secondly, the construction and key technology realization of mass data storage system based on MongoDB are studied. Thirdly, experimental analysis is done on the massive astronomical data storage system. In this part, four groups of experiments are used to study. The experimental data are obtained by storing a large amount of astronomical data FITS files, and then the experimental data are compared and analyzed. Finally, the following conclusions are drawn. Firstly, in distributed storage such as NoSQL database, fragmentation can greatly improve the performance of data storage and retrieval. Second, different slice sizes will also affect the storage and retrieval performance. Finding the optimal slice size is very important for distributed storage. For 4m FITS files, the storage efficiency is the highest when the slice size is 512K. Thirdly, memory mapped storage database such as MongoDB will appear some blocking when storing and retrieving data, and the experiment proves that the blocking condition has no obvious relation with fragmentation. Fourth, different file sizes are stored, and the optimal slice size is different. In the seven-component slices selected in the experiment, when storing FITS files less than 16m, the ratio between the optimal fragment size and the file size is 1? 8. However, for FITS files greater than or equal to 16m, the optimal fragment size does not increase with the increase of the file size, and the storage efficiency is the highest in the case of 1m. In addition, under the condition of using only two ordinary servers and so on, through the analysis of the experimental data, the access speed can reach 80m. If the cluster conditions (such as high memory, high bandwidth, multi-network card, multi-data node, etc.) are improved, The storage capacity and speed will be greatly improved, so that the efficient storage of massive astronomical data can be achieved. Cloud storage is such a platform that can integrate network storage resources and realize multi-node, thus it can be inferred that cloud storage is the inevitable trend of massive astronomical data storage. Finally, summarize the research work, draw the research conclusion and put forward the prospect of the future.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.41;TP333
本文编号:2463561
[Abstract]:With the rapid development of computer technology and network technology, and the updating of software and hardware, the data is increasing exponentially. Such a huge amount of data we call massive data, even big data. This marks the arrival of the big data era. Unlike previous data, more and more data belong to unstructured data, such as sound, picture and video files. In the field of astronomy, with the continuous improvement of the design and manufacture technology of astronomical observation equipment and terminal equipment, the continuous establishment of local observation stations and the continuous expansion of the scale, the astronomical observation capability has doubled. It has changed from ancient optical observations to full-band astronomical research. Astronomical data are increasing at an alarming rate, even per second. Astronomical field faces the challenge of massive data storage. Facing the requirement of astronomical mass data storage, the traditional relational database is no longer the ideal solution to solve the problem, and its inherent characteristics even become the limitation of mass data storage. Cloud computing, cloud storage new storage and computing ideas to bring new changes in the field of IT. Based on this situation, this paper studies the application prospect of cloud storage platform NoSQL database in astronomical massive picture storage. In this paper, the cloud storage technology and the application of NoSQL database in astronomical field are deeply studied by MongoDB. First of all, basic theory investigation. Secondly, the construction and key technology realization of mass data storage system based on MongoDB are studied. Thirdly, experimental analysis is done on the massive astronomical data storage system. In this part, four groups of experiments are used to study. The experimental data are obtained by storing a large amount of astronomical data FITS files, and then the experimental data are compared and analyzed. Finally, the following conclusions are drawn. Firstly, in distributed storage such as NoSQL database, fragmentation can greatly improve the performance of data storage and retrieval. Second, different slice sizes will also affect the storage and retrieval performance. Finding the optimal slice size is very important for distributed storage. For 4m FITS files, the storage efficiency is the highest when the slice size is 512K. Thirdly, memory mapped storage database such as MongoDB will appear some blocking when storing and retrieving data, and the experiment proves that the blocking condition has no obvious relation with fragmentation. Fourth, different file sizes are stored, and the optimal slice size is different. In the seven-component slices selected in the experiment, when storing FITS files less than 16m, the ratio between the optimal fragment size and the file size is 1? 8. However, for FITS files greater than or equal to 16m, the optimal fragment size does not increase with the increase of the file size, and the storage efficiency is the highest in the case of 1m. In addition, under the condition of using only two ordinary servers and so on, through the analysis of the experimental data, the access speed can reach 80m. If the cluster conditions (such as high memory, high bandwidth, multi-network card, multi-data node, etc.) are improved, The storage capacity and speed will be greatly improved, so that the efficient storage of massive astronomical data can be achieved. Cloud storage is such a platform that can integrate network storage resources and realize multi-node, thus it can be inferred that cloud storage is the inevitable trend of massive astronomical data storage. Finally, summarize the research work, draw the research conclusion and put forward the prospect of the future.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.41;TP333
【引证文献】
相关硕士学位论文 前1条
1 马文杰;基于CAP理论的海量数据存储研究与应用[D];苏州大学;2013年
,本文编号:2463561
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2463561.html