基于固态硬盘的搜索引擎混合式存储结构研究
发布时间:2018-03-08 11:45
本文选题:全文检索 切入点:搜索引擎 出处:《华中科技大学》2012年硕士论文 论文类型:学位论文
【摘要】:大型搜索引擎索引了数以亿计的海量文档,每秒钟需要处理数百万个查询请求。目前,许多大型搜索引擎使用磁盘(HDD)存储海量的索引数据,磁盘的低速I/O成为其主要性能瓶颈。与传统的机械式磁盘不同,新型存储设备固态硬盘(SSD)有许多优点,尤其是其较高的随机数据存取能力,然而,它也有潜在的问题,如单位容量存储成本高、读写速度不对称和块擦除次数有限等,因此,当前大型搜索引擎还不能用SSD完全取代磁盘。 搜索引擎是典型的I/O密集型应用,它在I/O模式上表现出明显的特征,如读为主、局部性、跳跃读和随机读。基于SSD的搜索引擎混合式存储结构是检索性能、硬件成本和系统可靠性的折中,,它综合考虑了SSD的读写特性和搜索引擎应用的I/O特征,将热点数据缓存在内存和SSD中,尽可能减少访问磁盘的次数,提高系统的I/O性能。 基于SSD的搜索引擎混合式存储的数据管理策略采用基于日志的思想组织SSD中的数据,其目的是提高搜索引擎的检索性能,同时降低SSD中的块擦除操作,它主要包括三个方面:一是数据选择策略,它是根据缓存数据的不同特征,合理的选择数据存储在内存或SSD中;二是数据放置策略,它采用一种改进的基于日志的数据管理策略来组织和管理SSD中的数据,以确保其高效的读写;三是数据替换策略,它对SSD中缓存的结果和倒排表采取不同的覆写策略,以尽量避免开销昂贵的随机写操作,减少块擦除操作。实验结果进一步验证了上述数据管理策略的有效性,其中,缓存命中率提高了13.31%,检索性能提高了41.05%,SSD中Flash平均访问时间降低了43.83%,SSD中块擦除次数减少了71.52%。
[Abstract]:Large search engines index hundreds of millions of massive documents, processing millions of query requests per second. Currently, many large search engines use disk disk HDDs to store huge amounts of indexed data. The low speed I / O of the disk is the main performance bottleneck. Unlike the traditional mechanical disk, the new storage device, the solid state hard disk (SSDs), has many advantages, especially its high random data access capability, but it also has potential problems. Such as high storage cost per unit capacity, asymmetric reading and writing speed and limited number of block erasures, etc., therefore, currently large search engines can not completely replace disks with SSD. Search engine is a typical I / O intensive application. It shows obvious characteristics in I / O mode, such as reading, locality, jumping reading and random reading. The hybrid storage structure of search engine based on SSD is retrieval performance. The tradeoff between hardware cost and system reliability takes into account the reading and writing characteristics of SSD and the I / O features of search engine applications. The hot data is cached in memory and SSD to minimize the number of disk access and improve the I / O performance of the system. The data management strategy of hybrid storage in search engine based on SSD uses the idea of log to organize the data in SSD. The purpose of the strategy is to improve the retrieval performance of search engine and reduce the block erasure operation in SSD. It mainly includes three aspects: one is the data selection strategy, it is according to the different characteristics of cached data, the reasonable choice of data storage in memory or SSD, the other is the data placement strategy, It uses an improved log-based data management strategy to organize and manage data in SSD to ensure its efficient reading and writing; third, a data replacement strategy, which overrides the cached results and inverted tables in SSD. In order to avoid expensive random write operation and reduce block erasure operation, the experimental results further verify the effectiveness of the above data management strategy. Cache hit rate increased 13.31%, retrieval performance improved 41.05% Flash average access time decreased 43.83% SSD block erasure times decreased 71.52%.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3;TP333
【参考文献】
相关期刊论文 前2条
1 刘沾沾;岳丽华;金培权;向小岩;;一种针对闪存的高效缓冲区置换算法[J];小型微型计算机系统;2010年08期
2 寿黎但;廖定柏;徐昶;陈刚;;PWLRU:一种面向闪存数据库的缓冲区存取算法[J];浙江大学学报(工学版);2010年12期
本文编号:1583762
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1583762.html