并行文件系统缓存技术的研究
发布时间:2018-09-10 10:11
【摘要】:互联网技术的蓬勃发展,伴随着的是数据的日益膨胀,人们对数据的存储要求也就越来越高了,许多应用系统的数据量都达到了PB级别,面对这些海量数据,对存储系统的容量提出了巨大的挑战,如何能够实现对这些数据的快速有效的存储成为当前存储技术的研究热点。当前衡量数据存储系统的性能好坏的几个指标是系统性能、可用性、可扩展性和安全性。原有单一的简单存储文件系统已不能满足现今的数据存储需要,并行文件系统以它具有的高扩展性、高性能、高可用性和高安全性等优点,成为业内普遍采用的数据存储管理方式,而缓存技术影响着文件系统对数据存储的速率,一个好的缓存技术能够在很大程度上提高系统的性能。 本文在这样的背景下,对并行文件系统缓存技术进行研究,针对并行文件系统GlusterFS,设计一个基于内存缓存技术Memcached的中间缓存架构(InterMediate Caching architecture, IMCa)。具体的研究工作内容有以下几点: 论文首先介绍了三种存储技术,阐述了它们的各自的特点区别和,接着对几种经典的文件系统做了简要地介绍。然后分析了并行文件系统GlusterFS的系统架构,客户端和服务器端的工作原理,剖析了GlusterFS文件系统基于翻译器Translator实现各种功能的设计原理。接着在Memcached的基础上,设计了MCa缓存系统,IMCa缓存系统分为三大部分:GlusterFS客户端、Memcached缓存层、GlusterFS服务器端。在设计的过程中,考虑了缓存替换和热点数据问题,与GlusterFS客户端的连接方式等问题。IMCa的Memcached缓存层又分了三层,分别是网络接口层、系统控制层和数据存储层。网络接口层负责与客户端进行连接并对连接进行管理,完成分析命令,处理命令,处理并发连接等功能。系统控制层是整个缓存层的核心,包括负载均衡、数据管理和副本管理等功能,数据管理负责完成对数据操作的控制,包括数据的添加、查询、更新等。副本管理的设计是保证热点数据能够复制到其他备用的缓存节点上,以便当出现热点数据请求时,能够实现一致性数据读取。数据存储层实现的是具体数据的存储问题,同时对数据存放的时间设置了有效时间长度。 最后结合设计的系统框架进行系统搭建,并对该系统的性能进行相关的测试,实验结果也表明,该缓存系统在提高并行文件系统GlusterFS的性能方面有所帮助。 图32个,参考文献56个
[Abstract]:With the rapid development of Internet technology, with the increasing expansion of data, people have higher and higher requirements for data storage, many applications have reached the PB level of data, facing these massive data, It is a great challenge to the capacity of storage system. How to realize the fast and effective storage of these data has become the research hotspot of the current storage technology. At present, the performance of data storage system is measured by system performance, usability, scalability and security. The original simple storage file system can not meet the current data storage needs. Parallel file system has the advantages of high scalability, high performance, high availability and high security. It is widely used in the field of data storage management, and cache technology affects the speed of file system to data storage. A good cache technology can improve the performance of the system to a great extent. In this paper, the parallel file system cache technology is studied, and an intermediate cache architecture (InterMediate Caching architecture, IMCa). Based on memory cache technology Memcached is designed for parallel file system GlusterFS,. The specific contents of the research work are as follows: firstly, three storage technologies are introduced, their characteristics and differences are described, and then several classical file systems are briefly introduced. Then the system architecture of parallel file system (GlusterFS), the working principle of client and server are analyzed, and the design principle of GlusterFS file system based on translator Translator is analyzed. Then, on the basis of Memcached, the MCa cache system is designed, which is divided into three parts: GlusterFS client and Memcached cache layer. In the design process, cache replacement and hot data issues are considered. The Memcached buffer layer of IMCA is divided into three layers: network interface layer, system control layer and data storage layer. The network interface layer is responsible for the connection with the client and manages the connection, completes the analysis command, handles the concurrent connection and so on. The system control layer is the core of the whole buffer layer, including load balancing, data management and replica management. Data management is responsible for the control of data operation, including data addition, query, update and so on. Replica management is designed to ensure that hot data can be copied to other standby cache nodes so that consistent data reading can be achieved when hot data requests occur. The data storage layer implements the storage problem of specific data and sets the effective time length of the data storage. Finally, the system is built with the designed system framework, and the performance of the system is tested. The experimental results also show that the cache system can improve the performance of the parallel file system (GlusterFS). 32 figures, 56 refs
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333;TP316.4
[Abstract]:With the rapid development of Internet technology, with the increasing expansion of data, people have higher and higher requirements for data storage, many applications have reached the PB level of data, facing these massive data, It is a great challenge to the capacity of storage system. How to realize the fast and effective storage of these data has become the research hotspot of the current storage technology. At present, the performance of data storage system is measured by system performance, usability, scalability and security. The original simple storage file system can not meet the current data storage needs. Parallel file system has the advantages of high scalability, high performance, high availability and high security. It is widely used in the field of data storage management, and cache technology affects the speed of file system to data storage. A good cache technology can improve the performance of the system to a great extent. In this paper, the parallel file system cache technology is studied, and an intermediate cache architecture (InterMediate Caching architecture, IMCa). Based on memory cache technology Memcached is designed for parallel file system GlusterFS,. The specific contents of the research work are as follows: firstly, three storage technologies are introduced, their characteristics and differences are described, and then several classical file systems are briefly introduced. Then the system architecture of parallel file system (GlusterFS), the working principle of client and server are analyzed, and the design principle of GlusterFS file system based on translator Translator is analyzed. Then, on the basis of Memcached, the MCa cache system is designed, which is divided into three parts: GlusterFS client and Memcached cache layer. In the design process, cache replacement and hot data issues are considered. The Memcached buffer layer of IMCA is divided into three layers: network interface layer, system control layer and data storage layer. The network interface layer is responsible for the connection with the client and manages the connection, completes the analysis command, handles the concurrent connection and so on. The system control layer is the core of the whole buffer layer, including load balancing, data management and replica management. Data management is responsible for the control of data operation, including data addition, query, update and so on. Replica management is designed to ensure that hot data can be copied to other standby cache nodes so that consistent data reading can be achieved when hot data requests occur. The data storage layer implements the storage problem of specific data and sets the effective time length of the data storage. Finally, the system is built with the designed system framework, and the performance of the system is tested. The experimental results also show that the cache system can improve the performance of the parallel file system (GlusterFS). 32 figures, 56 refs
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333;TP316.4
【参考文献】
相关期刊论文 前10条
1 吴e,
本文编号:2234151
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2234151.html