海量虚拟身份数据的存储管理关键技术研究与实现
发布时间:2018-08-03 20:06
【摘要】:随着计算机网络及其应用的快速发展,网络上出现了越来越多的网络平台、应用,用户在不同的平台、应用可能会使用大量的虚拟身份应用信息。不论是静态数据如注册账号,还是用户交互消息如信息等都属于虚拟身份应用信息,它们存储的数据总量均达到TB级别甚至PB级别。在Web2.0时代,互联网应用需要处理大量用户创作或者分享的数据,比如图片、视频、博客日志等,这些数据类型多种多样并且格式、大小也不尽相同。数据量大,类型多样,大小不一的特性对于海量数据存储、管理提出了严峻的考验。本文是基于863重大项目——***网络身份管理与应用技术中的子课题***虚拟身份管理。它的主要功能是通过多种手段获得不同平台下的虚拟身份数据,并对它们做以统一管理,为实际的网络平台、应用提供接口,方便查找、追溯等。本文是对虚拟身份数据的存储关键技术进行研究,主要解决和实现了存储时的数据模型,在分布式环境下数据划分、数据副本以及查询时提高效率的多维索引和缓存等问题,并在虚拟身份追溯系统中模拟运行进行检测,为实现项目的要求提供存储基础。本文是基于Cassandra数据库的,主要工作包括:(1)在存储方面,针对虚拟身份数据量大,涉及模糊查询等特点,提出了基于MySQL数据库和Cassandra数据库相结合的数据模型。在分布式环境下,考虑了数据划分和数据备份等问题,设计与实现了基于加权改进一致性hash算法的数据划分方法和基于数据规模与热点变化相结合的数据副本策略。(2)在查询方面,针对虚拟身份查询请求中的无指定列的查询,机器节点快速准确定位等问题,设计并实现了Cassandra索引与倒排索引、节点索引相结合的多维度索引。考虑到请求访问的局部性原理,设计实现了针对虚拟身份特点的语义缓存技术。(3)在系统实现方面,以虚拟追溯系统为依托,对存储方面的数据模型、数据划分思想以及数据副本策略,查询方面的多维度索引和语义缓存做了性能测试,证明了以上方法对提高系统效率具有很好的性能。
[Abstract]:With the rapid development of computer network and its applications, more and more network platforms appear on the network. Users may use a large amount of virtual identity application information in different platforms. Both static data such as registered accounts and interactive messages such as information belong to virtual identity application information. The total amount of data stored by them reaches TB level or even PB level. In the era of Web2.0, Internet applications need to deal with a large number of user-created or shared data, such as pictures, videos, blog logs, and so on. The characteristics of large amount of data, diverse types and different sizes put forward a severe test for massive data storage and management. This paper is based on 863 major project * Network identity management and application technology in the subproject * virtual identity management. Its main function is to obtain virtual identity data under different platforms by various means, and to manage them uniformly, to provide interfaces for practical network platforms and applications, to facilitate searching and tracing, and so on. In this paper, the key technology of storage of virtual identity data is studied, which mainly solves and implements the data model, data partition in distributed environment, data replica, multidimensional index and cache to improve the efficiency of query, and so on. In the virtual identity traceability system, the simulated operation is tested to provide the storage base for the project. This paper is based on Cassandra database. The main work includes: (1) aiming at the characteristics of large amount of virtual identity data and fuzzy query, a data model based on the combination of MySQL database and Cassandra database is proposed. In the distributed environment, the problems of data partitioning and data backup are considered. This paper designs and implements the data partitioning method based on the weighted improved consistent hash algorithm and the data replica strategy based on the combination of data scale and hot spot change. (2) in the aspect of query, the query with no specified column in the virtual identity query request is designed and implemented. In order to locate the machine nodes quickly and accurately, this paper designs and implements a multi-dimensional index which combines Cassandra index, inverted index and node index. Considering the local principle of request access, this paper designs and implements the semantic cache technology for the characteristics of virtual identity. (3) in the aspect of system implementation, the data model of storage is based on virtual traceability system. The idea of data partitioning, data replica strategy, multi-dimensional index and semantic cache in query are tested, which proves that these methods have good performance in improving system efficiency.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
,
本文编号:2162866
[Abstract]:With the rapid development of computer network and its applications, more and more network platforms appear on the network. Users may use a large amount of virtual identity application information in different platforms. Both static data such as registered accounts and interactive messages such as information belong to virtual identity application information. The total amount of data stored by them reaches TB level or even PB level. In the era of Web2.0, Internet applications need to deal with a large number of user-created or shared data, such as pictures, videos, blog logs, and so on. The characteristics of large amount of data, diverse types and different sizes put forward a severe test for massive data storage and management. This paper is based on 863 major project * Network identity management and application technology in the subproject * virtual identity management. Its main function is to obtain virtual identity data under different platforms by various means, and to manage them uniformly, to provide interfaces for practical network platforms and applications, to facilitate searching and tracing, and so on. In this paper, the key technology of storage of virtual identity data is studied, which mainly solves and implements the data model, data partition in distributed environment, data replica, multidimensional index and cache to improve the efficiency of query, and so on. In the virtual identity traceability system, the simulated operation is tested to provide the storage base for the project. This paper is based on Cassandra database. The main work includes: (1) aiming at the characteristics of large amount of virtual identity data and fuzzy query, a data model based on the combination of MySQL database and Cassandra database is proposed. In the distributed environment, the problems of data partitioning and data backup are considered. This paper designs and implements the data partitioning method based on the weighted improved consistent hash algorithm and the data replica strategy based on the combination of data scale and hot spot change. (2) in the aspect of query, the query with no specified column in the virtual identity query request is designed and implemented. In order to locate the machine nodes quickly and accurately, this paper designs and implements a multi-dimensional index which combines Cassandra index, inverted index and node index. Considering the local principle of request access, this paper designs and implements the semantic cache technology for the characteristics of virtual identity. (3) in the aspect of system implementation, the data model of storage is based on virtual traceability system. The idea of data partitioning, data replica strategy, multi-dimensional index and semantic cache in query are tested, which proves that these methods have good performance in improving system efficiency.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP333
,
本文编号:2162866
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2162866.html