云环境下海量语义数据的查询策略

发布时间：2018-11-14 07:52

【摘要】：为了实现对海量RDF数据的高效查询,研究RDF数据在分布式数据库HBase中的存储方法。基于MapReduce设计海量RDF数据的两阶段查询策略,将查询分为SPARQL预处理阶段与分布式查询执行阶段。SPARQL预处理阶段设计实现基于SPARQL变量关联度的查询划分算法JOVR,通过计算SPARQL查询语句中变量的关联度确定连接变量的连接顺序,根据连接变量将SPARQL子句连接操作划分到最小数量的MapReduce任务中;分布式查询执行阶段执行SPARQL预处理阶段划分的MapReduce任务,实现对海量RDF数据的并行查询。采用LUBM标准测试数据集对查询策略予以验证。研究结果表明:JOVR算法能够高效地实现对海量RDF数据的查询,并具有较强的稳定性与可扩展性。
[Abstract]:In order to efficiently query massive RDF data, the storage method of RDF data in distributed database HBase is studied. The two-stage query strategy of massive RDF data is designed based on MapReduce, and the query is divided into SPARQL preprocessing stage and distributed query execution stage. SPARQL preprocessing stage is designed to implement the query partition algorithm JOVR, based on SPARQL variable correlation degree. The join order of the join variables is determined by calculating the correlation degree of the variables in the SPARQL query statement, and the join operation of the SPARQL clause is divided into the smallest number of MapReduce tasks according to the join variables. In the distributed query execution phase, the MapReduce tasks divided by the SPARQL preprocessing phase are executed, and the parallel query for massive RDF data is realized. The query strategy is verified by LUBM standard test data set. The research results show that the JOVR algorithm can efficiently query the massive RDF data, and has strong stability and scalability.
【作者单位】：中南大学软件学院;
【基金】：国家自然科学基金资助项目(61301136,61572525,61602525)~~
【分类号】：TP311.13

【相似文献】