基于POSIX语义的分布式文件系统客户端设计与实现
发布时间:2018-03-07 22:04
本文选题:分布式文件系统 切入点:FUSE 出处:《电子科技大学》2013年硕士论文 论文类型:学位论文
【摘要】:自云计算概念提出以来,作为核心部分之一的云存储(分布式文件系统)也迅速成为研究热点。与普通的存储方式不同,云存储是由大量普通PC形成的存储集群来提供海量分布式数据存储服务,通过数据冗余的方式来实现数据透明存储和底层节点的容错性。如何实现数据的高效存储是云存储中的关键技术之一。 本文通过对现有分布式文件系统详细介绍和全面分析之后,提出了一个基于私有云的分布式文件系统CStore,CStore系统由大量的元数据服务器和数据服务器节点等部件组成,向上层用户提供一个高效的存储服务。最后,结合项目中的实际工作,论文设计并实现了CStore系统中基于POSIX语义的Linux客户端。整个客户端所包含的基本工作和创新点如下: 1.FUSE文件系统。客户端实现了一个基于FUSE协议的用户态文件系统驱动,通过以虚拟磁盘的方式实现云端数据的本地化,原有程序均不修改就能直接运行。客户端内部实现采用流水线处理模型,每个功能模块采用一个线程进行管理,线程之间采用域套接字通信方式和等待队列机制来实现异步消息的处理。 2.缓存组织。为了减少云端服务器压力和通信开销,客户端提出了一个高效的数据缓存算法将元数据进行内存缓存和本地化缓存,数据进行内存缓存。在结构组织方面,客户端采用了树形目录的方式管理元数据缓存并通过使用红黑树和LRU淘汰策略来实现数据分片的随机读写。在启动时,客户端自动加载本地元数据,在系统退出后,客户端将所有内存元数据进行本地化存储。 3.网络通信模型。客户端采用C++实现了一套基于非阻塞的异步网络通信框架来实现与服务器通信,并通过线程池来实现磁盘的异步读写功能。与内核通信部分,客户端采用C语言设计并实现该模块所有基本数据结构及业务逻辑。 4.提出了基于操作日志的方式来保证数据的同步。客户端通过定时触发机制向元数据服务器拉取并本地执行操作日志的方式来达到数据的一致。 5.持久化存储。客户端提出了一个基于B+树的元数据持久化存储模型来存储内存中的元数据。通过B+树建立索引和追加写策略,,有效解决了元数据中目录存储和索引节点存储问题。 本文将通过文件系统的形式来显示云端数据,该虚拟磁盘中的数据来自云端服务器,也可进行扩展至P2P客户端。最后,对客户端进行压力测试与对比测试发现,该客户端总体性能优于同类分布式文件系统。
[Abstract]:Since the concept of cloud computing was put forward, cloud storage (distributed file system), which is one of the core parts of cloud computing, has become a research hotspot. Cloud storage is a storage cluster formed by a large number of ordinary PCs to provide massive distributed data storage services. How to realize the efficient storage of data is one of the key technologies in cloud storage. After the detailed introduction and comprehensive analysis of the existing distributed file systems, this paper proposes a distributed file system based on private cloud, which is composed of a large number of metadata servers and data server nodes. Finally, combined with the actual work in the project, this paper designs and implements the Linux client based on POSIX semantics in CStore system. The basic work and innovation of the whole client are as follows:. 1.FUSE file system. The client implements a user-state file system driver based on FUSE protocol, and localizes cloud data by virtual disk. The original program can be run directly without modification. The client implementation adopts pipeline processing model, each function module is managed by a thread. Threads use domain socket communication and wait queue mechanism to process asynchronous messages. 2. Cache organization. In order to reduce the cloud server pressure and communication overhead, the client proposes an efficient data caching algorithm to cache metadata in memory and local cache, and data to be cached in memory. The client uses the tree directory to manage the metadata cache and realizes the random reading and writing of the data fragments by using the red-black tree and the LRU elimination strategy. At startup, the client automatically loads the local metadata, and after the system exits, the client automatically loads the local metadata. The client stores all memory metadata locally. 3. Network communication model. Client uses C to implement a set of asynchronous network communication framework based on non-blocking to realize communication with server, and to realize asynchronous reading and writing function of disk through thread pool. The client uses C language to design and implement all the basic data structures and business logic of this module. 4. A method based on the operation log is proposed to ensure the synchronization of the data, and the data consistency is achieved by the client pulling the metadata server from the metadata server and executing the operation log locally. 5. persistent storage. The client proposes a metadata persistence storage model based on B-tree to store metadata in memory. The problem of directory storage and index node storage in metadata is solved effectively. This paper will display cloud data in the form of file system. The data in the virtual disk comes from cloud server and can also be extended to P2P client. The overall performance of the client is superior to that of similar distributed file systems.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
【参考文献】
相关期刊论文 前5条
1 段翰聪,卢显良,宋杰;基于EPOLL的单进程事件驱动通信服务器设计与分析[J];计算机应用;2004年10期
2 段翰聪;王勇涛;李林;;EDFUSE:一个基于异步事件驱动的FUSE用户级文件系统框架[J];计算机科学;2012年S1期
3 任飞;王念秋;段翰聪;;大规模分布式存储系统中数据修复策略的研究[J];互联网天地;2013年02期
4 陈康;郑纬民;;云计算:系统实例与研究现状[J];软件学报;2009年05期
5 段翰聪;梅玫;李林;;基于哈希规则的分布式文件系统的设计与实现[J];小型微型计算机系统;2013年06期
本文编号:1581144
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/1581144.html