适用于海量数据应用的多维Hash表结构
发布时间:2018-06-09 15:55
本文选题:多维 + Hash表 ; 参考:《清华大学学报(自然科学版)》2017年06期
【摘要】:传统的Hash表通过对目标数据进行Hash计算,可以实现数据的快速存取与检索。为了保持较好的存储性能,需要使整个Hash表保持疏松的状态,从而牺牲掉10%~25%的空间。这对于海量数据存储而言,是一种巨大的空间浪费。该文提出一种多维Hash表结构,通过增加Hash表在逻辑上的维度,大大降低了Hash表的冲突率,实现了在较高的填充率下获得较满意的性能。实验结果表明:在千万的数据量级上,二维Hash表的冲突率比传统Hash表的减小2~4个数量级,总体性能则提升了1个数量级。该文还在原有填充率的基础上,提出失效率的概念,进一步完善和统一了Hash表性能评价指标。
[Abstract]:The traditional Hash table can quickly access and retrieve the target data by Hash calculation. In order to maintain good storage performance, it is necessary to keep the whole Hash table loose at the expense of 10 or 25% of the space. This is a huge waste of space for massive data storage. In this paper, a multidimensional Hash table structure is proposed. By increasing the logical dimension of the Hash table, the collision rate of the Hash table is greatly reduced, and the satisfactory performance is achieved at a higher filling rate. The experimental results show that the collision rate of the two-dimensional Hash table is 2 ~ 4 orders of magnitude less than that of the traditional Hash table, and the overall performance of the two-dimensional Hash table is improved by 1 order of magnitude. Based on the original filling rate, the concept of failure rate is put forward, and the performance evaluation index of Hash table is further improved and unified.
【作者单位】: 国防科技大学计算机学院;
【分类号】:TP311.12
,
本文编号:2000297
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2000297.html