包含完整性约束的概率关系数据库更新和查询优化方法研究

发布时间：2018-03-01 02:35

本文关键词： 概率关系数据库可能世界基于约束的更新函数依赖关系查询　出处：《华中科技大学》2016年博士论文　论文类型：学位论文

【摘要】：随着数据清洗、传感器网络、追踪移动物体等应用对不确定数据的管理要求越来越高,概率关系数据模型作为一个对不确定数据进行有效管理的重要模型,自2003年开始引起学术界和工业界开始高度关注。从形式上说,一个概率关系数据库是一组传统关系数据库上(可能世界)的概率分布,而完整性约束是关系数据上的重要信息,因此,提出一个包含完整性约束的概率关系数据库模型,并研究该模型上的更新与查询方法具有重要意义。针对目前大部分的不确定数据模型研究着重于描述具体数据之间的约束关系,而没有考虑模式级别的约束关系的问题,提出了一个包含完整性约束的概率关系数据库模型。不确定数据模式级别的完整性约束信息能捕捉动态更新下的数据间的关联关系,因此,利用基于约束的概率关系数据库更新,自动更新数据间的关联关系,有效防止了概率关系数据库包含不合理的可能世界的发生。由于现有将不确定数据从可能世界集合表示方式转化为基于变量的表示方式的数据模型转化方法导致元组表达式十分冗长,通过分析元组表达式的生成规则,提出了一个高效的数据模型转化方法。该转化方法基于一个消除表达式中重复变量的公式,减少了后续查询在处理元组表达式的计算开销。实验表明该数据模型转化方法在没有增加额外时间开销的前提下,大大简化了元组表达式,且提高了后续查询的处理效率。为了解决目前基于约束的概率关系数据库更新方法枚举概率关系数据库中元组的表达式里出现的所有变量的取值,而导致的高时间复杂度的问题,提出了一个高效的更新方法。该方法只需考虑在约束中出现的变量取值,且采用变量替换机制更新元组的表达式,避免了概率关系数据库中其他变量的参与。实验表明该方法在各种参数配置下,都优于现有的更新方法。针对目前基于约束的概率关系数据库更新方法,在获取相关变量满足约束的取值集合这个十分耗时的重要步骤中,没有考虑针对常见的函数依赖约束的特征进行优化的问题,提出了两种更新优化策略。剪枝策略将相关元组的表达式单独遍历,避免了遍历一个由各相关元组表达式组合而成的复杂表达式,减少了遍历到的变量数量,从而减少了获取相关变量满足约束的取值集合的时间。在剪枝策略的基础上,变量消除策略合并多个满足约束且对应相同可能世界的变量取值来最小化新生成的变量数目,利于后续的查询处理。实验结果表明剪枝策略能进一步提高基于约束的概率关系数据库更新方法的效率,而变量消除策略能在不带来额外开销的情况下减少新生成的变量数量。针对目前大部分的概率关系数据库上的一般查询优化方法着重于研究加速查询结果世系逻辑表达式,而没有考虑在查询处理过程中生成简化的结果世系表达式的问题,提出了一个利用模式级别的约束信息来简化查询结果世系数据表达式的优化方法。分别利用函数依赖约束和引用完整性约束这两种模式级别的信息对两种关系操作的世系数据给出了简化的生成方式。假设查询对于概率关系数据库有重要应用价值。为了避免目前基于生成新数据库版本通用处理方法会带来额外更新开销的问题,提出了一种利用条件概率来处理假设查询的优化方法。该方法通过计算结果在假设条件下的条件概率,避免了不必要的概率关系数据库更新。实验结果验证了一般查询优化方法和假设查询优化方法的有效性。
[Abstract]:With data cleaning, sensor networks, motion tracking applications of uncertain data management requirements more and more high, the probability of the relational data model as an important model for the uncertain data of effective management, since the beginning of 2003 caused by industrial and academic circles began to pay close attention to. From the form, a probabilistic relational database is a group of traditional relational database (World) probability distribution, and integrity constraints are important information, so the relationship between data, propose a contains integrity constraints probabilistic relational database model, and it is important to study the update and query method of the model. At present, most of the research on uncertain data the model focuses on the relationship between the specific constraint description data, without considering the constraint mode level of the problem, propose a probability of integrity constraints The relational database model. The uncertain data model level integrity constraint information can capture the dynamic relationship between the update data, using the update probability constraints based on relational database, automatically update the correlation between data, effectively prevent the probabilistic relational database contains irrational world may occur. Because the existing will not to determine the data from possible world representation into the set of variables representing the data model conversion method based on the result tuple expression is long, by generating rules analysis tuple expression, proposes an efficient conversion method. The data model conversion method based on a eliminate duplication of variables in the expression formula, reduced in subsequent query the computational overhead processing tuple expression. The experimental results show that the data model transformation method in the absence of additional time overhead. Under the premise, greatly simplifies the tuple expression and improves the efficiency of subsequent processing, query. In order to solve the present value of all the variables based on the tuple expression probabilistic database constraint update method for enumeration of probabilistic relational database in, due to the high complexity of the problem, we propose an efficient update method. This method only need to consider the variables appearing in the constraint in the expression and the use of variable substitution mechanism to update tuples, avoid other variable probability in a relational database. Experimental results show that the method in the parameter configuration, updating method is superior to the existing in the probabilistic relational database based on constraint update method in to obtain relevant variables to satisfy the constraint set value of this very time-consuming important step, no consideration for common function dependency features into For optimization problems, put forward two kinds of optimization strategies. The pruning strategy update tuple expression separate traversal, traverse a complex expression by each tuple expression together to avoid, reduce the number of variables to traverse, thus reducing the access to relevant variables to meet set constraint value based on time. Pruning strategy, the number of variables to eliminate variables merge multiple constraints and strategies corresponding to the same possible worlds to new generation minimization, for subsequent query processing. The experimental results show that the pruning strategy can further improve the efficiency of a probabilistic relational constraint update method based on variable elimination strategy can reduce the number of the new generation do not bring extra variables in the overhead. At present, most of the probabilistic relational database query optimization method generally focuses on research The accelerated query results lineage logic expression, without considering the simplified generation during query processing results of lineage expression problems, we proposed a model level constraint information to simplify the query results of data lineage expression optimization method. Using function dependency constraints and reference to the relationship between the two operation of the lineage of data integrity the two constraint level information model was introduced to produce simplified. Suppose the query has important application value for the probabilistic relational database. In order to avoid the formation of new processing method based on the generic version of the database will bring additional update overhead, proposes the use of a conditional probability method assumes that queries to conditional probability the processing. Through the calculation results under the assumption that the probability of avoiding unnecessary updates in relational database. The experimental results verify The general query optimization method and the hypothesis query optimization method are effective.

【学位授予单位】：华中科技大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP311.13

【相似文献】