分布式溯源信息存储系统的研究与实现

发布时间：2018-03-23 02:35

本文选题：溯源信息　切入点：存储系统　出处：《电子科技大学》2016年硕士论文　论文类型：学位论文

【摘要】：随着云计算与大数据技术的迅速发展和应用,海量数据的存储和管理问题成为人们关注的焦点,对数据存储的灵活性、可扩展性以及并发性等都提出了更高的要求。众多的互联网应用使得多样化的非结构化数据大量地产生,而传统的关系型数据库使用二维表来描述数据及数据之间的关系,因此它不适宜用于存储灵活多变的非结构化数据。为满足这些需求,许多新的存储设备和存储技术应运而生,例如SSD、NoSql、分布式存储技术等,以适应非结构化数据应用场景,提高存储和读写效率并尽量降低存储成本。面对海量的数据,人们通常会关心某些数据的生命周期,例如它是何时被创建的、被哪些用户使用过、存在多少副本等,这些信息对于数据管理、系统安全维护等来说有着十分重要的意义,通常也被称为溯源信息。溯源信息描述了一个对象的历史运动轨迹和动态衍生过程以及对象之间相互作用和推进的关系,随着时间的推移,这些数据越来越庞大,对象之间的关系变得越来越复杂,因此如何有效地描述和存储海量的溯源信息使得用户可以简单高效地对其进行存取,便是本文研究的核心。针对海量溯源信息的存储问题,本文设计和实现了一个高性能溯源信息存储系统DBPS(Double Buffer Provenance Store)。DBPS根据溯源信息的特点,在基于中心节点的分布式体系架构基础上采取了多层次的存储架构,包括缓存层和持久化存储层。DBPS在缓存层采用了读写分离的双缓存架构,设计了特定于溯源信息的数据存储结构和索引,对溯源信息具有感知能力,在持久化存储层它采用key-value数据库作为底层的持久化存储引擎,在提高数据的读写效率同时降低了存储资源的消耗。与DBPS相比,大多数的溯源系统或溯源应用都直接使用关系型数据库或图形数据库等现有的数据库来存储溯源信息,在读写溯源信息时需要对数据进行复杂的处理,读写效率较低。实验结果表明,本文设计和实现的DBPS系统在创建和查询溯源对象的数据时具有较高的效率,而在修改和删除数据时效率相对较低,但在实际的应用中修改和删除操作的使用频率很低,因此使用DBPS来存取溯源信息的整体性能突出,能够很好地满足用户的需求。
[Abstract]:With the rapid development and application of cloud computing and big data technology, the storage and management of massive data has become the focus of attention, the flexibility of data storage, Extensibility and concurrency are higher requirements. Many Internet applications make a variety of unstructured data generated in large quantities, while traditional relational databases use two-dimensional tables to describe the relationship between data and data. Therefore, it is not suitable for storing flexible and changeable unstructured data. In order to meet these requirements, many new storage devices and storage technologies have emerged, such as SSDN NoSql, distributed storage technology and so on, in order to adapt to unstructured data application scenarios. People usually care about the life cycle of certain data, such as when it is created, who has used it, how many copies it is, and so on. This information is of great significance to data management, system security maintenance, etc. Also known as traceability information. Traceability information describes an object's historical trajectory and dynamic derivation process, as well as the interaction and advancement of objects, and these data grow larger and larger over time. The relationship between objects becomes more and more complex, so how to effectively describe and store massive traceability information to enable users to easily and efficiently access it is the core of this paper. In this paper, a high performance traceability information storage system DBPS(Double Buffer Provenance Store).DBPS is designed and implemented. According to the characteristics of traceability information, a multi-layer storage architecture is adopted based on the distributed architecture based on central node. The cache layer and the persistent storage layer. DBPS adopts the dual cache architecture of read-write separation in the cache layer, designs the data storage structure and index specific to traceability information, and has the ability to perceive the traceability information. In the persistent storage layer, key-value database is used as the underlying persistent storage engine, which improves the efficiency of data reading and writing and reduces the consumption of storage resources. Most traceability systems or applications directly use existing databases such as relational databases or graphic databases to store traceability information, which requires complex processing when reading and writing traceability information. The experimental results show that the DBPS system designed and implemented in this paper has a high efficiency in creating and querying the data of traceability objects, but in modifying and deleting the data, the efficiency is relatively low. However, the frequency of modifying and deleting operations is very low in practical applications, so the overall performance of using DBPS to access traceability information is outstanding and can meet the needs of users.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP333

【参考文献】