Boosting方法在基因微阵列数据判别分析中的应用
发布时间:2018-10-23 21:19
【摘要】:基于高通量的“微阵列(Microarray)”技术的迅速发展,给统计学专业人员提供了大量的微阵列数据。这类“小样本、高维度”的资料(m>>n),给传统的分类判别方法带来了前所未有的挑战,Boosting方法作为集成算法中的一员,一直以其“完美”的分类能力吸引着众多的研究者和应用者。 本研究在系统介绍了Boosting的基本思想,以及它的两种算法——AdaBoost和LogitBoost的基本过程的基础上,,采用这两种Boosting算法对模拟数据和维度较低的资料建立判别预测模型,并与另两种集成算法(Bagging和Random-Forest)和三种传统判别分析方法(Fisher’s线性判别、Fisher’s二次判别和logistic回归判别)的预测效果进行了比较。 本研究根据基因微阵列数据的特殊性,对两个网络数据库——白血病数据和乳腺癌数据进行了分析,思路如下:(1)使用FDR控制程序校正P值,以P≤0.05或P≤0.01为标准筛选基因变量,使得维度小于样本含量,建立判别预测模型,将Boosting方法与两种集成算法和三种传统的方法相比较;(2)按照P值的排序选择不同数目的基因预测变量,分别建立判别预测模型,考察Boosting的相对优势(包括预测精度和敏感性);(3)提取主成分,作主成分判别分析,考察Boosting方法的优势。以上均用交叉验证思路考察模型的预测效果和预测结果的稳定性。 本研究主要结论: 1.Boosting的总体预测效果普遍优于Bagging、Random-Forest以及传统的
[Abstract]:The rapid development of microarray (Microarray) technology based on high throughput provides a large amount of microarray data to statisticians. This kind of "small sample, high dimensional" data (m > n),) brings an unprecedented challenge to the traditional classification and discrimination methods. The Boosting method is a member of the ensemble algorithm. It has attracted many researchers and applicators for its perfect classification ability. Based on the systematic introduction of the basic idea of Boosting and the basic process of its two algorithms, AdaBoost and LogitBoost, the two Boosting algorithms are used to establish the discriminant prediction model for the simulated data and the low-dimensional data. The prediction results are compared with two other ensemble algorithms (Bagging and Random-Forest) and three traditional discriminant analysis methods (Fisher's linear discriminant, Fisher's quadratic discriminant and logistic regression discriminant). According to the particularity of gene microarray data, two network databases, leukemia data and breast cancer data, were analyzed in this study. The main ideas were as follows: (1) using FDR control program to correct P value, Using P 鈮
本文编号:2290491
[Abstract]:The rapid development of microarray (Microarray) technology based on high throughput provides a large amount of microarray data to statisticians. This kind of "small sample, high dimensional" data (m > n),) brings an unprecedented challenge to the traditional classification and discrimination methods. The Boosting method is a member of the ensemble algorithm. It has attracted many researchers and applicators for its perfect classification ability. Based on the systematic introduction of the basic idea of Boosting and the basic process of its two algorithms, AdaBoost and LogitBoost, the two Boosting algorithms are used to establish the discriminant prediction model for the simulated data and the low-dimensional data. The prediction results are compared with two other ensemble algorithms (Bagging and Random-Forest) and three traditional discriminant analysis methods (Fisher's linear discriminant, Fisher's quadratic discriminant and logistic regression discriminant). According to the particularity of gene microarray data, two network databases, leukemia data and breast cancer data, were analyzed in this study. The main ideas were as follows: (1) using FDR control program to correct P value, Using P 鈮
本文编号:2290491
本文链接:https://www.wllwen.com/yixuelunwen/binglixuelunwen/2290491.html
最近更新
教材专著