面向数据融合的半环溯源计算方法
发布时间:2018-01-01 08:35
本文关键词:面向数据融合的半环溯源计算方法 出处:《计算机研究与发展》2016年02期 论文类型:期刊论文
更多相关文章: 数据融合 半环溯源 多项式系统 派生树 递归查询
【摘要】:数据融合是集成数据的质量保证和分析挖掘的前提条件;然而,数据融合作为一个整体对于用户来讲是一个黑盒过程,使得当前数据融合过程缺乏可解释性和可调试性.为了便于数据融合过程中有效的冲突检测和调试,需要利用数据溯源技术建立数据融合的可回溯机制.数据溯源描述了数据产生并随着时间推移而演变的整个过程,半环溯源模型作为一种经典的数据溯源表示形式,不仅能表示结果数据是由哪些数据派生的,而且还能够描述这些数据以什么方式进行派生.主要研究用于数据融合的半环溯源的计算问题.用于数据融合的半环溯源计算是一个pay as you go的模式,计算数据的溯源信息是一个非常耗时的过程.首先,提出一种基于Kleene序列的近似迭代方法,并证明了该方法与半环溯源的派生树定义的关系,从而证明了该方法的正确性.然后,提出了一种类牛顿序列,这种方法比Kleene序列有更好的收敛性.由于递归的引入可能会导致这2种迭代算法无法终止,通过分析结果元组的半环多项式溯源的特点,证明这2种近似算法最坏可在n次迭代后终止.最后,通过实验说明了本文提出的方法是可行和有效的.
[Abstract]:Data fusion is a prerequisite for quality assurance of integrated data and analysis and mining. However, data fusion as a whole is a black box process for users. It makes the current data fusion process lack of interpretability and debugging, in order to facilitate the effective conflict detection and debugging in the process of data fusion. Data traceability technology needs to be used to establish a traceability mechanism for data fusion. Data traceability describes the whole process of data generation and evolution over time. Semi-loop traceability model, as a classical data traceability representation, can not only represent the data derived from the result data. It is also able to describe how these data are derived. This paper mainly studies the computation of semicyclic traceability for data fusion. The semicyclic traceability calculation for data fusion is a pay as you. Go mode. The traceability information of computing data is a time-consuming process. Firstly, an approximate iterative method based on Kleene sequence is proposed, and the relationship between the method and the definition of derivative tree of semi-traceability is proved. The correctness of the method is proved. Then, a kind of Newtonian sequence is proposed. This method has better convergence than Kleene sequence. Because the introduction of recursion may lead to the two iterative algorithms can not be terminated, by analyzing the characteristics of the semi-ring polynomial of the result tuple traceability. It is proved that the worst of these two approximate algorithms can be terminated after n iterations. Finally, the experimental results show that the proposed method is feasible and effective.
【作者单位】: 东北大学信息科学与工程学院;
【基金】:国家自然科学基金项目(61472070) 国家“九七三”重点基础研究发展规划基金项目(2012CB316201)~~
【分类号】:TP202
【正文快照】: 随着网络的飞速发展,Web技术以其广泛性、交互性、快捷性和开放性等特点迅速风靡全球,并且已经渗入到社会的各个领域,网站及网页数量正以指数级飞速增长.如何准确、有效地集成海量高价值的Web信息,对于诸如市场情报分析、舆情分析、商业智能等分析型应用尤为重要,具有非常重要,
本文编号:1363747
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1363747.html