HBase下的高效时空分类索引
发布时间:2018-05-21 10:58
本文选题:流数据 + HBase ; 参考:《小型微型计算机系统》2017年06期
【摘要】:海量流数据具有体量大、更新速度快、多维度、多属性等特点,其存储和查询是近年来学术界和工业界的研究热点之一.HBase系统为海量流数据的存储管理提供了一套具有高可扩展性的技术方法和系统平台.然而HBase仅支持主键索引,导致非主键数据查询效率较低,尤其是对于多维的数据.针对交通流数据场景提出一种具有高插入和查询效率的索引结构TA-index.TA-index考虑数据访问时的时间和空间局部性,从而更准确地获得数据的特征,通过对时间和空间的不同分类索引,减少索引的数据量,提供实时的数据分析能力.实验表明该算法效率比现有算法更优,而且具有高可扩展性,可以同时支持高吞吐量和高效多维查询.
[Abstract]:Mass stream data has the characteristics of large volume, fast updating speed, multi-dimensional, multi-attribute, etc. Its storage and query is one of the hot research topics in academia and industry in recent years. HBase system provides a set of technical methods and system platform with high scalability for the storage and management of massive stream data. However, HBase only supports primary key index, which leads to low efficiency of non-primary key data query, especially for multidimensional data. For traffic flow data scene, an index structure, TA-index.TA-index with high insertion and query efficiency, is proposed, which takes into account the temporal and spatial locality of data access, so as to obtain the features of the data more accurately, and through the different classification indexes of time and space. Reduce the amount of data in the index and provide real-time data analysis capabilities. Experimental results show that the proposed algorithm is more efficient and scalable than the existing algorithms, and can support both high throughput and efficient multidimensional queries.
【作者单位】: 南京航空航天大学计算机技术与科学学院;
【基金】:国家自然科学基金项目(61373015)资助
【分类号】:TP311.13;U495
,
本文编号:1918920
本文链接:https://www.wllwen.com/kejilunwen/daoluqiaoliang/1918920.html