云环境下海量语义数据的查询策略
发布时间:2018-11-14 07:52
【摘要】:为了实现对海量RDF数据的高效查询,研究RDF数据在分布式数据库HBase中的存储方法。基于MapReduce设计海量RDF数据的两阶段查询策略,将查询分为SPARQL预处理阶段与分布式查询执行阶段。SPARQL预处理阶段设计实现基于SPARQL变量关联度的查询划分算法JOVR,通过计算SPARQL查询语句中变量的关联度确定连接变量的连接顺序,根据连接变量将SPARQL子句连接操作划分到最小数量的MapReduce任务中;分布式查询执行阶段执行SPARQL预处理阶段划分的MapReduce任务,实现对海量RDF数据的并行查询。采用LUBM标准测试数据集对查询策略予以验证。研究结果表明:JOVR算法能够高效地实现对海量RDF数据的查询,并具有较强的稳定性与可扩展性。
[Abstract]:In order to efficiently query massive RDF data, the storage method of RDF data in distributed database HBase is studied. The two-stage query strategy of massive RDF data is designed based on MapReduce, and the query is divided into SPARQL preprocessing stage and distributed query execution stage. SPARQL preprocessing stage is designed to implement the query partition algorithm JOVR, based on SPARQL variable correlation degree. The join order of the join variables is determined by calculating the correlation degree of the variables in the SPARQL query statement, and the join operation of the SPARQL clause is divided into the smallest number of MapReduce tasks according to the join variables. In the distributed query execution phase, the MapReduce tasks divided by the SPARQL preprocessing phase are executed, and the parallel query for massive RDF data is realized. The query strategy is verified by LUBM standard test data set. The research results show that the JOVR algorithm can efficiently query the massive RDF data, and has strong stability and scalability.
【作者单位】: 中南大学软件学院;
【基金】:国家自然科学基金资助项目(61301136,61572525,61602525)~~
【分类号】:TP311.13
本文编号:2330532
[Abstract]:In order to efficiently query massive RDF data, the storage method of RDF data in distributed database HBase is studied. The two-stage query strategy of massive RDF data is designed based on MapReduce, and the query is divided into SPARQL preprocessing stage and distributed query execution stage. SPARQL preprocessing stage is designed to implement the query partition algorithm JOVR, based on SPARQL variable correlation degree. The join order of the join variables is determined by calculating the correlation degree of the variables in the SPARQL query statement, and the join operation of the SPARQL clause is divided into the smallest number of MapReduce tasks according to the join variables. In the distributed query execution phase, the MapReduce tasks divided by the SPARQL preprocessing phase are executed, and the parallel query for massive RDF data is realized. The query strategy is verified by LUBM standard test data set. The research results show that the JOVR algorithm can efficiently query the massive RDF data, and has strong stability and scalability.
【作者单位】: 中南大学软件学院;
【基金】:国家自然科学基金资助项目(61301136,61572525,61602525)~~
【分类号】:TP311.13
【相似文献】
相关期刊论文 前10条
1 刘焕亭,张凌燕;分布式数据库系统的查询策略研究[J];科学技术与工程;2005年20期
2 李晓华;曹健;张申生;牟玉洁;;面向异构过程库的过程查询策略与系统研究[J];计算机集成制造系统;2006年10期
3 唐朝伟;张伟军;;WSN中用户需求信息相关性查询策略[J];计算机应用研究;2011年01期
4 罗英伟,邢彭龄;基于XML的地理信息元数据系统的查询策略[J];计算机工程;2004年22期
5 余敏;李战怀;张龙波;;基于super-peer的连续查询策略[J];计算机工程与应用;2006年01期
6 刘涛;张志明;;一种基于P2P网络Gnutella模型的查询策略[J];计算机应用与软件;2006年06期
7 王士同,夏振华;模糊知识库的查询策略[J];小型微型计算机系统;1988年08期
8 马志锋,邢汉承,郑晓妹;分布式知识系统中基于粗糙集合的查询策略研究[J];信息与控制;2001年02期
9 吴洪潭,吕青毅,丁文;高考招生信息的分布式查询技术[J];中国计量学院学报;2000年02期
10 刘波;杨路明;雷刚跃;;基于蚁群算法的XML概率查询策略与算法优化[J];计算机工程;2008年05期
相关会议论文 前1条
1 余敏;李战怀;张龙波;;P2P连续查询策略分析与研究[A];第二十二届中国数据库学术会议论文集(研究报告篇)[C];2005年
,本文编号:2330532
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2330532.html