基于元数据关联特征的分布式查询方法研究

发布时间：2018-06-01 07:05

本文选题：关联特征 + 元数据查询　；参考：《华中科技大学》2013年硕士论文

【摘要】：信息技术的不断发展对信息存储高容量，高性能提出更高的要求，云存储应运而生。大规模存储系统的应用越来越广泛，存储容量从以前的TB（Terabyte）级上升到PB（Petabyte）级甚至EB（Exabyte）级。用户在使用海量存储空间的同时，也发现数据的查找和管理变得越来越困难。现有的元数据管理方法存在可扩展性弱、查询效率低、实时性差等缺点。针对上述缺点，提出了一种基于关联特征的元数据查询方法，可以充分利用多维元数据的关联特征来提高查询效率。系统在局部灵敏哈希(LSH)聚集数据的基础上建立分布式索引。全局索引划分采用LSH哈希表桶内划分的方式来提升系统的扩展性，可以避免大量的数据迁移。索引的维护采用分层架构，每层独立配置，同时具有较好的可扩展性，方便系统管理。为了快速更新分布式索引，系统的索引采用文件存储，并基于版本批量更新。在这种架构下，查询请求采用基于代理的处理模式，每个请求都会分配一台查询服务器作为代理，，代理节点负责转发查询请求、收集查询结果并返回给客户端。通过测试，这种查询方法相对于传统的基于一维索引的方式查询效率有了显著的提高，并且系统的响应时间随着数据规模的增大而呈拟线性增长。同时，系统提出的基于版本的批量更新策略，使得系统的索引更新效率相对于MySQL数据库提升10倍左右。
[Abstract]:With the development of information technology, high capacity and high performance of information storage are required, cloud storage emerges as the times require. Large-scale storage systems are becoming more and more popular, with storage capacity rising from the previous TBU terabyte- to the PB-Petabyte- or even EB-Exabyte-. While using mass storage space, users also find it more and more difficult to find and manage data. The existing metadata management methods have some shortcomings, such as weak scalability, low query efficiency and poor real-time performance. In view of the above shortcomings, a metadata query method based on association features is proposed, which can make full use of the association features of multidimensional metadata to improve the query efficiency. The distributed index is built on the basis of local sensitive hashing LSHs aggregated data. Global index partitioning is based on LSH hash table bucket partitioning to improve the scalability of the system and to avoid a large amount of data migration. The index is maintained in a hierarchical architecture, each layer is independently configured, and it has good scalability and is convenient for system management. In order to update the distributed index quickly, the index of the system is stored in files and updated in batches based on version. In this architecture, the query request is processed in a proxy-based mode, and each request is assigned a query server as a proxy. The proxy node is responsible for forwarding the query request, collecting the query results and returning them to the client. The test results show that the query efficiency of this method is significantly higher than that of the traditional one-dimensional indexing method, and the response time of the system increases with the increase of data scale. At the same time, the index updating efficiency of the system is about 10 times higher than that of MySQL database.
【学位授予单位】：华中科技大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333

【参考文献】