含有计数量词的图模式匹配研究与实现
发布时间:2019-03-09 18:32
【摘要】:计数量词作为一种增强表达的方式,加入到图模式匹配中可以更加准确地描述客观世界。通过简单地在查询图的边上附加计数量词可以很自然的表达出全部和存在量化表达,比例和数量聚集表达,以及否定表达。这种增加了表达能力的图模式匹配在社会媒体营销,知识发现,网络安全等领域都有广泛且直接的应用。关于量词化的图模式匹配问题的研究目前仍然比较新,而且该问题本身泛化了传统图模式匹配问题条件,传统图模式匹配问题仅属于本文问题的一个特例。因而本文所研究的量词化的图模式匹配问题复杂度远难于传统的图模式匹配。用边上计数量词是否包含否定计数量词可以将该问题划分成两个复杂度相异的两个子问题,第一类子问题模式图边上只包含肯定的计数量词,它的复杂度可以证明是NP完全的,并且在特定条件下复杂度的上下界都为NP难,而含有否定计数量词边的匹配问题复杂度则是DP完全的,特定条件下复杂度上下界为DP难,如此高的复杂度使得蛮力法解决该问题的开销无法令人接受。本文实现了(1)正QGP问题的匹配算法QMatch,该算法通过调用子图同构子算法而改进实现,本文通过实验验证发现算法在模式图规模在4以内时的匹配时间是比较好的,规模变大后的匹配时间呈指数式增长。(2)负QGP问题的匹配算法Inc QMatch,该算法避免了直接验证负计数量词的高复杂度,在正问题解决的基础上,利用集差思想,构建两个新的查询图结构得到最后的结果集,通过记录终结结果集的方式可以避免第二次查询从头开始有效地减少了运行时间,并且通过多线程的方式对该问题的匹配实现了并行化,实验表明当边上的负计数量词个数只有一时,Inc QMatch的时间开销在查询图规模超过4以后同样呈显指数式的增长。(3)利用哈希编码技术给节点周围边标签和节点标签进行编码,通过与的手段压缩周围结构信息,并将周围的标签种类信息压缩到一串二进制字串当中,通过或运算匹配节点,得到小规模候选集合,在本文实验中,使用8+8位编码,可以过滤掉95%的不匹配,优化效率非常明显,且在查询图规模是6的时候都表现出了小于10秒的运行时间,优化效果非常显著。
[Abstract]:Counting quantifiers, as an enhanced expression, can describe the objective world more accurately when added to graph pattern matching. By simply appending quantifiers to the edges of the query graph, it is natural to express all and existing quantitative expressions, proportion and number aggregation, and negative expressions. This kind of graph pattern matching, which increases the expression ability, is widely and directly applied in the fields of social media marketing, knowledge discovery, network security and so on. The research on quantized graph pattern matching problem is still relatively new, and the problem itself generalizes the condition of traditional graph pattern matching problem, and the traditional graph pattern matching problem is only a special case of this problem. Therefore, the complexity of quantized graph pattern matching problem studied in this paper is far more difficult than the traditional graph pattern matching problem. Whether the edge quantifier contains negative quantifier can divide the problem into two sub-problems with different complexity. The first subproblem contains only positive quantifiers on the edge of the pattern diagram, and its complexity can be proved to be NP complete. And the upper and lower bounds of complexity are difficult for NP under certain conditions, while the complexity of matching problems with negative number of words is DP complete, and the upper and lower bounds of complexity are difficult for DP under specific conditions. Such high complexity makes the cost of brute force solving the problem unacceptable. In this paper, (1) the matching algorithm QMatch, for positive QGP problem is implemented. This algorithm is improved by calling the subgraph isomorphism sub-algorithm. The experiment shows that the matching time of the algorithm is better when the scale of the pattern graph is less than 4, and the matching time of the algorithm is better when the scale of the pattern graph is less than 4. The matching time increases exponentially after the scale increases. (2) the matching algorithm Inc QMatch, of the negative QGP problem avoids the high complexity of verifying the negative quantifier directly, on the basis of solving the positive problem, the idea of set difference is used. Building two new query graph structures to get the final result set, by recording the end result set, can prevent the second query from starting from scratch effectively reducing the run time. And the problem is parallelized by multi-thread matching. The experiment shows that when the number of negative quantifiers on the edge is only one time, the number of negative quantifiers on the edge is only one time. The time cost of Inc QMatch increases exponentially after the size of the query graph exceeds 4. (3) Hash coding technology is used to encode the edge tags and node labels around the nodes and compress the surrounding structural information by means of the same method. And the surrounding label type information is compressed into a string of binary strings, and small-scale candidate sets can be obtained by or operation of matching nodes. In this experiment, 95% mismatch can be filtered out by using 8.8-bit encoding. The optimization efficiency is very obvious, and the run time is less than 10 seconds when the query graph scale is 6, and the optimization effect is very remarkable.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41
本文编号:2437744
[Abstract]:Counting quantifiers, as an enhanced expression, can describe the objective world more accurately when added to graph pattern matching. By simply appending quantifiers to the edges of the query graph, it is natural to express all and existing quantitative expressions, proportion and number aggregation, and negative expressions. This kind of graph pattern matching, which increases the expression ability, is widely and directly applied in the fields of social media marketing, knowledge discovery, network security and so on. The research on quantized graph pattern matching problem is still relatively new, and the problem itself generalizes the condition of traditional graph pattern matching problem, and the traditional graph pattern matching problem is only a special case of this problem. Therefore, the complexity of quantized graph pattern matching problem studied in this paper is far more difficult than the traditional graph pattern matching problem. Whether the edge quantifier contains negative quantifier can divide the problem into two sub-problems with different complexity. The first subproblem contains only positive quantifiers on the edge of the pattern diagram, and its complexity can be proved to be NP complete. And the upper and lower bounds of complexity are difficult for NP under certain conditions, while the complexity of matching problems with negative number of words is DP complete, and the upper and lower bounds of complexity are difficult for DP under specific conditions. Such high complexity makes the cost of brute force solving the problem unacceptable. In this paper, (1) the matching algorithm QMatch, for positive QGP problem is implemented. This algorithm is improved by calling the subgraph isomorphism sub-algorithm. The experiment shows that the matching time of the algorithm is better when the scale of the pattern graph is less than 4, and the matching time of the algorithm is better when the scale of the pattern graph is less than 4. The matching time increases exponentially after the scale increases. (2) the matching algorithm Inc QMatch, of the negative QGP problem avoids the high complexity of verifying the negative quantifier directly, on the basis of solving the positive problem, the idea of set difference is used. Building two new query graph structures to get the final result set, by recording the end result set, can prevent the second query from starting from scratch effectively reducing the run time. And the problem is parallelized by multi-thread matching. The experiment shows that when the number of negative quantifiers on the edge is only one time, the number of negative quantifiers on the edge is only one time. The time cost of Inc QMatch increases exponentially after the size of the query graph exceeds 4. (3) Hash coding technology is used to encode the edge tags and node labels around the nodes and compress the surrounding structural information by means of the same method. And the surrounding label type information is compressed into a string of binary strings, and small-scale candidate sets can be obtained by or operation of matching nodes. In this experiment, 95% mismatch can be filtered out by using 8.8-bit encoding. The optimization efficiency is very obvious, and the run time is less than 10 seconds when the query graph scale is 6, and the optimization effect is very remarkable.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41
【参考文献】
相关期刊论文 前6条
1 张丽霞;王伟平;高建良;王建新;;利用局部评估的分布式图模式匹配算法[J];国防科技大学学报;2016年02期
2 张丽霞;王伟平;高建良;王建新;;面向模式图变化的增量图模式匹配[J];软件学报;2015年11期
3 于静;刘燕兵;张宇;刘梦雅;谭建龙;郭莉;;大规模图数据匹配技术综述[J];计算机研究与发展;2015年02期
4 吕雪栋;冯志勇;王鑫;饶国政;付宇新;;StepMatch:一种基于BSP计算模型的SPARQL基本图模式匹配算法[J];计算机研究与发展;2013年S2期
5 张航;王宏志;李建中;高宏;;基于2-hop优化的子图模式匹配算法[J];黑龙江大学自然科学学报;2010年01期
6 吴祉群;吉方;黄文;;基于遗传算法图像模式匹配的收敛性研究[J];现代电子技术;2007年01期
,本文编号:2437744
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2437744.html