基于马尔科夫覆盖的基因交互检测算法研究
发布时间:2019-01-14 16:33
【摘要】:贝叶斯网络是一种基于概率推理的图形化网络,贝叶斯网络中的马尔科夫覆盖则是一种根据变量间的关联性条件来寻找与目标变量相关集合的方法。研究发现贝叶斯网络中的马尔科夫覆盖方法适用于全基因关联分析中的上位性检测问题。近年来,一系列基于马尔科夫覆盖的上位性检测算法被提出,但是对于大规模的全基因组数据来说,这些算法依旧存在检测效率低和假阳性率高等问题。本文将就这些问题对基于马尔科夫覆盖的上位性检测算法进行进一步研究。为了提高已有的基于马尔科夫覆盖的上位性检测算法的性能,本文提出了一个优化的基于马尔科夫覆盖的上位性检测算法—OMBED(Optimized Markov Blanket for Epistasis Detection)算法。该算法共分为三个阶段:移除阶段、前向阶段和后向阶段。在移除阶段中,根据条件独立判断移除候选集合中的无关变量;在前向阶段中,利用G2测试值来衡量变量间关联性强度,将关联性强的变量加入到目标集合,将关联性较弱的变量移出候选集合,最终得到最小的马尔科夫覆盖变量集合;在后向阶段中,主要是移除马尔科夫覆盖集合中的假阳性变量。该算法在原有算法的前向阶段通过对加入和移出变量操作进行了优化,减少了变量G2测试次数,降低了算法复杂度。在一系列模拟数据集以及真实数据集上的实验结果表明,该算法具有较好的检测效率,并降低了假阳性率。
[Abstract]:Bayesian network is a kind of graphical network based on probabilistic reasoning. Markov covering in Bayesian network is a method to find the correlation set of target variables according to the condition of correlation between variables. It is found that the Markov covering method in Bayesian networks is suitable for the epistasis detection in the whole gene association analysis. In recent years, a series of epistatic detection algorithms based on Markov covering have been proposed, but for large-scale genome data, these algorithms still have problems such as low detection efficiency and high false positive rate. In this paper, we will further study the epistatic detection algorithm based on Markov covering. In order to improve the performance of the existing epistatic detection algorithm based on Markov covering, this paper proposes an optimized epistatic detection algorithm based on Markov covering, called OMBED (Optimized Markov Blanket for Epistasis Detection) algorithm. The algorithm is divided into three stages: removal phase, forward phase and backward phase. In the removal stage, the independent variables in the candidate set are determined independently according to the condition. In the forward phase, the G _ 2 test value is used to measure the correlation strength between variables, the highly correlated variables are added to the target set, the weaker variables are moved out of the candidate set, and the minimum Markov covering variable set is obtained. In the backward phase, the false positive variables in Markov overlay sets are mainly removed. The algorithm optimizes the operations of adding and removing variables in the forward phase of the original algorithm, reduces the number of G2 tests of variables and reduces the complexity of the algorithm. The experimental results on a series of simulated data sets and real data sets show that the algorithm has better detection efficiency and reduces the false positive rate.
【学位授予单位】:湖南师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP18
,
本文编号:2408880
[Abstract]:Bayesian network is a kind of graphical network based on probabilistic reasoning. Markov covering in Bayesian network is a method to find the correlation set of target variables according to the condition of correlation between variables. It is found that the Markov covering method in Bayesian networks is suitable for the epistasis detection in the whole gene association analysis. In recent years, a series of epistatic detection algorithms based on Markov covering have been proposed, but for large-scale genome data, these algorithms still have problems such as low detection efficiency and high false positive rate. In this paper, we will further study the epistatic detection algorithm based on Markov covering. In order to improve the performance of the existing epistatic detection algorithm based on Markov covering, this paper proposes an optimized epistatic detection algorithm based on Markov covering, called OMBED (Optimized Markov Blanket for Epistasis Detection) algorithm. The algorithm is divided into three stages: removal phase, forward phase and backward phase. In the removal stage, the independent variables in the candidate set are determined independently according to the condition. In the forward phase, the G _ 2 test value is used to measure the correlation strength between variables, the highly correlated variables are added to the target set, the weaker variables are moved out of the candidate set, and the minimum Markov covering variable set is obtained. In the backward phase, the false positive variables in Markov overlay sets are mainly removed. The algorithm optimizes the operations of adding and removing variables in the forward phase of the original algorithm, reduces the number of G2 tests of variables and reduces the complexity of the algorithm. The experimental results on a series of simulated data sets and real data sets show that the algorithm has better detection efficiency and reduces the false positive rate.
【学位授予单位】:湖南师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP18
,
本文编号:2408880
本文链接:https://www.wllwen.com/kejilunwen/jiyingongcheng/2408880.html
最近更新
教材专著