基于闪存的海量非关系存储方法研究

发布时间：2018-01-11 06:19

本文关键词：基于闪存的海量非关系存储方法研究　出处：《哈尔滨工业大学》2012年硕士论文　论文类型：学位论文

【摘要】：随着互联网行业的进展，越来越多的数据出现在各行各业中，极大地推动了社会的进步和时代的发展。而随着海量数据的增长，各种技术应运而生。另一方面，固态硬盘等硬件的应用，又使得数据在应用领域性能得到极大提升。在海量数据的应用中，和传统数据库中关系数据模型不同，最典型的是非关系数据库在分布式领域的应用，比如hadoop等应用框架的发展。但是由于常用的系统结构都是建立在传统硬件基础上，没有考虑固态硬盘等硬件的特性，因此性能优化基本集中在节点通信，负载均衡等方面，而忽略了硬件特性的发展。因此如何将海量数据，非关系数据结构，固态硬盘三者有机的结合在一起，根据现有模型进行优化，提升读写性能，是本文讨论的重点。本文在现有常用基于闪存的系统结构基础上，提出基于写和读两方面的性能改进。具体表现为：对于写算法研究，通过合理的数据结构和算法研究，将随机写的过程变成连续写的过程。并且对比固态硬盘连续写，和随机写的性能，以及将固态硬盘和普通硬盘应能做对比，完成了写算法的改进。在充分考虑和利用固态硬盘硬件特性的基础上，完成写优化算法的研究工作。对于读算法研究，通过改进布隆过滤器的结构，提出基于固态硬盘的多重布隆过滤器研究，对传统的布隆过滤器进行改造，将传统的布隆过滤器一次查询改为分步查询的过程，，从而达到提升性能得效果。通过改进数据结构，利用固态硬盘读数据快的特点和布隆过滤器假阳性查询代价高的特性，提升读的性能。
[Abstract]:With the development of the Internet industry, more and more data appear in various industries, which greatly promote the progress of society and the development of the times. The application of solid-state hard disk and other hardware makes the performance of data in the application field greatly improved. In the application of mass data, the most typical application of non-relational database in the distributed field is different from the traditional relational data model. For example, the development of application framework such as hadoop, but because the commonly used system structure is based on the traditional hardware, it does not consider the characteristics of hardware such as solid-state hard disk. Therefore, performance optimization is mainly focused on node communication, load balancing and so on, while ignoring the development of hardware characteristics. Therefore, how to combine mass data, non-relational data structure and solid state hard disk together organically. According to the existing model optimization, improve read and write performance, is the focus of this paper. Based on the existing system architecture based on flash memory, this paper proposes two performance improvements based on writing and reading. For the study of write algorithm, through reasonable data structure and algorithm research, the random write process is changed into continuous write process, and compared with the performance of solid-state hard disk continuous write and random write. By comparing the solid state hard disk with the ordinary hard disk, the paper completes the improvement of the write algorithm. On the basis of fully considering and utilizing the hardware characteristics of the solid state hard disk, the research on the write optimization algorithm is completed. For the study of reading algorithm, by improving the structure of Bron filter, the paper proposes the research of multiple Blunt filter based on solid state hard disk, and rebuilds the traditional Blunt filter. The traditional Blunt filter query is changed into a step by step query process to improve the performance and improve the data structure. The fast read data of solid state hard disk and the high cost of false positive query of Bloom filter are used to improve the performance of reading.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP333

【共引文献】