大规模时间序列数据存储系统的研究与实现
发布时间:2018-08-14 18:30
【摘要】:时间序列数据,即一段时间内以固定的时间间隔采集的数据点的序列,,已成为生产生活中重要的信息记录形式。RRDtool是当今被广泛使用的一种存储时间序列数据的数据库工具,但是基于RRDtool构建的时间序列数据存储系统在处理大规模数据业务时I/O负荷较重,单位时间内能够处理的RRD文件数量不能满足需求等问题;同时,数据量的迅速增长要求存储系统的容量具有较好的扩展性,特别是能够在不影响系统在线工作的情况下调整存储容量;另一方面,考虑到系统可能发生异常或部分受灾,需要一种在这些情况下能保证系统可用性的存储方案。 针对上述问题,本文研究并实现了一种面向大规模时间序列数据的存储系统。构建该系统的关键部分是mem-RRD和MooseFS,前者是对RRDtool的改进实现,I/O性能更好;后者是一种分布式文件系统,可以保证存储系统的可用性和扩展性。 本文首先介绍了一种命名为mem-RRD的基于用户空间缓冲的RRDtool改进方案的设计和实现过程;然后给出利用mem-RRD和MooseFS构建和部署面向大规模时间序列数据的存储系统的方案,最后针对该存储系统的I/O性能、可用性和扩展性进行了详细的测试,并对测试数据进行对比和分析。测试结果证明,基于mem-RRD和MooseFS构建的面向大规模时间序列数据的存储系统在I/O性能、可用性、容量扩展性等方面都有较大改进或较好表现。
[Abstract]:Time series data, which is the sequence of data points collected at a fixed time interval in a certain period of time, has become an important record form of information in production and life. RRDtool is a widely used database tool for storing time series data. However, the time series data storage system based on RRDtool is heavily loaded with I / O, and the number of RRD files per unit time can not meet the requirements. The rapid growth of data requires better scalability of storage systems, in particular the ability to adjust storage capacity without affecting the online operation of the system, and, on the other hand, considering that the system may be anomalous or partially affected, There is a need for a storage scheme that ensures the availability of the system in these cases. In order to solve the above problems, a large-scale time series data storage system is studied and implemented in this paper. The key part of this system is mem-RRD, which is a better implementation of I / O for RRDtool, and MooseFS, which is a distributed file system that can guarantee the availability and scalability of storage system. This paper first introduces the design and implementation of an improved RRDtool scheme based on user space buffer named mem-RRD, and then gives a scheme to build and deploy a storage system for large-scale time series data using mem-RRD and MooseFS. Finally, the I / O performance, availability and scalability of the storage system are tested in detail, and the test data are compared and analyzed. The test results show that the storage system for large-scale time series data based on mem-RRD and MooseFS has great improvement or good performance in the aspects of I / O performance, availability, capacity expansion and so on.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
本文编号:2183738
[Abstract]:Time series data, which is the sequence of data points collected at a fixed time interval in a certain period of time, has become an important record form of information in production and life. RRDtool is a widely used database tool for storing time series data. However, the time series data storage system based on RRDtool is heavily loaded with I / O, and the number of RRD files per unit time can not meet the requirements. The rapid growth of data requires better scalability of storage systems, in particular the ability to adjust storage capacity without affecting the online operation of the system, and, on the other hand, considering that the system may be anomalous or partially affected, There is a need for a storage scheme that ensures the availability of the system in these cases. In order to solve the above problems, a large-scale time series data storage system is studied and implemented in this paper. The key part of this system is mem-RRD, which is a better implementation of I / O for RRDtool, and MooseFS, which is a distributed file system that can guarantee the availability and scalability of storage system. This paper first introduces the design and implementation of an improved RRDtool scheme based on user space buffer named mem-RRD, and then gives a scheme to build and deploy a storage system for large-scale time series data using mem-RRD and MooseFS. Finally, the I / O performance, availability and scalability of the storage system are tested in detail, and the test data are compared and analyzed. The test results show that the storage system for large-scale time series data based on mem-RRD and MooseFS has great improvement or good performance in the aspects of I / O performance, availability, capacity expansion and so on.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前10条
1 赵澄东;王艳丽;赵保成;;信息技术及信息技术产业化分析[J];福建电脑;2009年06期
2 杨岳湘;邓文平;邓劲生;李阳;;基于云存储的网盘系统架构及关键技术研究[J];电信科学;2012年10期
3 沈俊 ,顾冠群 ,罗军舟;网络管理的研究和发展[J];计算机研究与发展;2002年10期
4 杨德志,黄华,张建刚,许鲁;大容量、高性能、高扩展能力的蓝鲸分布式文件系统[J];计算机研究与发展;2005年06期
5 张裔智;赵毅;汤小斌;;MD5算法研究[J];计算机科学;2008年07期
6 王珊;肖艳芹;刘大为;覃雄派;;内存数据库关键技术研究[J];计算机应用;2007年10期
7 周丰;逆波兰表达式及其算法实现[J];武汉交通职业学院学报;2004年02期
8 唐海娜,李俊;基于RRD的网络流量监测方法[J];微电子学与计算机;2003年07期
9 吴纲;;RRDtool性能优化的研究与实现[J];襄樊职业技术学院学报;2008年04期
10 毕建平,谢萍,刘艳萍;网络文件系统NFS[J];自动化技术与应用;2001年02期
相关博士学位论文 前1条
1 钱迎进;大规模Lustre集群文件系统关键技术的研究[D];国防科学技术大学;2011年
本文编号:2183738
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2183738.html