稀疏数据的贝叶斯网络结构学习

发布时间：2018-04-09 23:05

本文选题：稀疏数据　切入点：贝叶斯网络　出处：《山东师范大学》2017年硕士论文

【摘要】：图模型被广泛用于表示和分析随机变量之间的因果关系以及条件独立性.图模型中主要包括有向无环图,无向图和链图.其中有向无环图,也被称作贝叶斯网络,图中的边都是有向边,并且不能构成有向环.贝叶斯网络用来描述随机变量的因果关系.本文主要提出对贝叶斯网络结构进行学习的算法.贝叶斯网络结构学习算法主要有三类:1基于独立性检验的约束算法;2基于评分搜索的算法;3将独立性检验和评分搜索综合利用起来的算法.2008年,Xie和Geng针对贝叶斯网络结构学习提出贝叶斯网络结构学习的递归分解算法.这个算法是将大规模的贝叶斯网络结构学习问题递归地分割为较小规模的结构学习的问题.这个算法主要应用无向独立图的构建,这就导致了它有两个困难:一是在数据稀疏,变量较多的情况下无向独立图的构建不够准确;二是当变量较多时,无向图的构建也是比较复杂的.2013年Cai等人提出了可测因果分割算法(Scalable cAusation Discovery Al-gorithm, SADA) .这个算法将变量集 V 分割成三个集合 (V1,C,V2),其中只要保证在给定C的情况下,V1与V2之间没有边直接相连即可,并不需要它们条件独立,所以能够解决数据稀疏的困难.但是,Cai的算法在合并的过程中有可能出现假边,针对这个问题,本论文提出一种再学习的检查学习算法,这种算法的提出,结合了Cai和Xie算法中的优势,解决了稀疏数据贝叶斯网络结构学习的问题.本文提出的算法,首先将一个贝叶斯网络的变量集合不断地调用可测因果分割算法进行分割;然后在每一组因果分割上,先进行局部结构学习再进行合并得到可能有假边存在整体结构;最后寻找因果分割集及其邻居集合,在这之上调用再学习检查算法,进行修正学习以得到正确贝叶斯网络的骨架图,最后利用Meek准则确定出等价类来.
[Abstract]:Graph models are widely used to express and analyze causality and conditional independence between random variables.The graph model mainly includes directed acyclic graph, undirected graph and chain graph.The directed acyclic graph is also called Bayesian network. The edges of the graph are directed edges and cannot form directed rings.Bayesian networks are used to describe the causality of random variables.In this paper, a learning algorithm for Bayesian network structure is proposed.There are three kinds of Bayesian network structure learning algorithms: one is a constraint algorithm based on independence test; the other is an algorithm based on score search. In 2008, Xie and Geng aimed at BayesianA recursive decomposition algorithm for learning Bayesian network structure is proposed.This algorithm recursively divides the large-scale Bayesian network structure learning problem into smaller scale structural learning problems.This algorithm mainly applies the construction of undirected independent graph, which leads to two difficulties: one is that the construction of undirected independent graph is not accurate enough when the data is sparse, and the other is that when there are more variables, the construction of undirected independent graph is not accurate enough.The construction of undirected graph is also complicated. In 2013, Cai et al proposed scalable cAusation Discovery algorithm (SADAA).In this algorithm, the variable set V is divided into three sets: v _ 1 / C _ 1 / V _ 2, which ensures that there is no direct connection between V _ 1 and V _ 2 under a given C condition, and they do not need to be conditional independent, so the difficulty of data sparsity can be solved.However, false edges may appear in the merging process of Cai algorithm. In view of this problem, this paper proposes a relearning check learning algorithm, which combines the advantages of Cai and Xie algorithms.The problem of sparse data Bayesian network structure learning is solved.The algorithm proposed in this paper firstly segments a set of Bayesian network variables by calling the testable causality segmentation algorithm continuously, and then on each set of causal segmentation,The local structure learning is carried out first, then the global structure with false edges is obtained. Finally, the causal partition set and its neighbor set are found, and then the learning and re-checking algorithm is called.The correct skeleton diagram of Bayesian network is obtained by modified learning, and the equivalent class is determined by using Meek criterion.
【学位授予单位】：山东师范大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：F224

【相似文献】