基于结构化索引的RDF数据存储及查询方法的研究与实现

发布时间：2018-04-20 22:24

本文选题：RDF + HBase　；参考：《北京交通大学》2013年硕士论文

【摘要】：随着互联网和物联网的发展,网络中的数据量出现爆发式的增长,对数据共享与处理提出的新的要求,更多复杂的语义关系在大数据的条件下需要处理和应用。大规模RDF数据的存储、查询,及有效支持数据挖掘等的数据处理方法,对计算机、制造业、铁路等多行业的数据处理具有重要的理论和应用意义。本文针对铁路传感器应用的需求,提出一种基于HBase的面向结构化索引的RDF数据存储及查询方法。首先,针对大规模数据的存储要求,提出一种基于结构的RDF数据索引方式,通过分析数据图中节点的连接关系构造索引图,利用该索引对数据进行划分,满足同一结构的数据集中存储,以这种方法降低数据查询时的消耗,加快查询速度。其次,提出了使用HBase来处理RDF数据存储的方案,根据结构化索引实现数据划分,并利用“谓词-主体-客体”的三元组方式实现HBase存储结构,同时提出行键值编码方法以解决RDF数据中的多值问题,有效减少目标数据查询的范围,提高查询效率。再次,提出了基于结构化索引及SPARQL语句重排的RDF数据查询方法,根据查询中不同语句间未知变量的绑定关系及执行一条查询语句所产生的消耗进行相关度的计算,以此为依据对SPARQL进行重排,重排后的语句通过结构化索引及物理查询两层操作完成数据的查询,查询效率得到较好的提升。最后,针对该铁路传感器应用场景,对该系统的总体查询效率进行了实验验证,较经典的RDF数据存储及检索系统Sesame获得了更好的查询效率。图29幅,表20张,参考文献40篇。
[Abstract]:With the development of Internet and Internet of Things , the amount of data in the network increases exponentially , and the new requirements for data sharing and processing are required . More complex semantic relations need to be processed and applied under the condition of large data . The data processing method of large - scale RDF data is of great theoretical and practical significance for data processing in many industries , such as computer , manufacturing , railway and so on . In this paper , based on the demand of railway sensor application , a RDF data storage and query method based on HBase for structured index is proposed .

First , aiming at the storage requirement of large - scale data , a structure - based RDF data index method is proposed , and the index map is constructed by analyzing the connection relationship between nodes in the data graph .

Secondly , the scheme of using HBase to process RDF data storage is put forward . According to the structured index , the data partition is realized , and the HBase storage structure is realized by using the triple way of " predicate - body - object " , meanwhile , a row key value encoding method is proposed to solve the multi - value problem in RDF data , thus effectively reducing the range of the target data query and improving the query efficiency .

Thirdly , based on the structure index and the RDF data query method , based on the binding relationship between the unknown variables and the consumption of executing a query statement , according to the binding relationship between the unknown variables in the query and the consumption of executing a query statement , the query of the data is completed by the structured index and the physical query , so that the query efficiency is improved .

Finally , according to the railway sensor application scene , the overall query efficiency of the system is verified experimentally . Compared with the classical RDF data storage and retrieval system , the better query efficiency is obtained . There are 29 tables , 20 tables and 40 references .

【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP333

【参考文献】