基于中心节点架构的大规模数据对象存储系统

发布时间：2018-08-17 15:06

【摘要】：伴随海量数据的到来，大数据逐渐进入人们视野，大数据多样化，规模大的特点，使得对象存储技术快速普及，正成为一种新的存储方式。对象存储为最终用户提供了统一的存储空间，每一个对象有唯一的访问标识，该标识在对象创建时产生，对象存储为用户提供了对象上载PUT和用户下载GET两类基本操作为，由于其简单并易于使用，被广泛采用。目前基于中心节点架构的大对象存储系统，如GFS、HDFS，随着对象规模的膨胀，其元数据规模也随之线性增长；单一物理存储块配置，无法有效支持小对象的存储。基于中心节点架构的小对象存储系统，如Haystack，仅支持小对象存储，并且不能支持多副本并发。针对该问题，本文设计并实现了对象存储系统LaUDObject，能够在感知用户的基础上,在有效支持大对象存储的同时，，还能够高效支持小对象存储。论文主要工作包括： (1)为了克服主节点上对象副本位置表规模膨胀，通过将多个对象成组，并将成组对象副本统一连续存储在某个节点上，在主节点中建立对象组副本位置表，从而有效减少了副本位置表的规模。对象标识的中间32位数值，对应其对象组编号，对象标识的后32位数值表示该对象在组内的序号。 (2)实现了支持小对象的并发更新操作的多副本顺序一致性策略，能够有效提高客户端对象更新效率。 (3)通过将组内的小对象合并成为一个大文件，并在外部建立索引的方式，实现了只需要一次磁盘访问即可完成读取操作，提升了小对象的访问速度。 (4)通过在感知用户标识，将对象组与用户建立关系，系统能够将同一用户的数据进行聚集存储，可以提高系统整体访问效率。面向大文件/小文件存储的多个应用场景，对LaUDObject、Hadoop和Cassandra进行了性能比较试验，初步验证了本文工作的有效性。
[Abstract]:With the arrival of massive data, big data has gradually entered people's field of vision, and the characteristics of big data are diversified and large-scale, which makes the technology of object storage become a new storage method. Object storage provides a uniform storage space for the end user. Each object has a unique access identity, which is generated when the object is created. The object store provides the user with two basic types of operations: object upload PUT and user download GET It is widely used because of its simplicity and ease of use. At present, the large object storage system based on central node architecture, such as GFSN HDFSs, increases linearly with the expansion of object size, and the single physical storage block configuration can not effectively support the storage of small objects. Small object storage systems based on central node architecture, such as Haystack, only support small object storage, and cannot support multi-replica concurrency. To solve this problem, this paper designs and implements the object storage system LaudObject.It can support large object storage effectively and small object storage efficiently on the basis of user awareness. The main work of this paper includes: (1) in order to overcome the expansion of object replica position table on the primary node, the multiple objects are grouped and the group replicas are stored on a node continuously. The replica location table of the target group is established in the primary node, which effectively reduces the scale of the replica location table. The intermediate 32-bit value of the object identification corresponds to the group number of the object, and the latter 32-bit value of the object identifier represents the ordinal number of the object within the group. (2) A multi-replica sequence consistency policy supporting concurrent update operations for small objects is implemented. Can effectively improve the client object update efficiency. (3) by merging small objects in the group into a large file and building an external index, only one disk access is required to complete the read operation. It improves the access speed of small objects. (4) the system can aggregate and store the data of the same user by perceiving the user identification and establishing the relationship between the target group and the user, which can improve the overall access efficiency of the system. For several application scenarios of large file / small file storage, the performance comparison between Laoud object Hadoop and Cassandra is carried out, and the effectiveness of this work is preliminarily verified.
【学位授予单位】：清华大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333

【共引文献】