双指标重要性优先法在分类问题中的应用

发布时间：2018-12-11 07:59

【摘要】：本文的目的是对微矩阵数据Leukemia 72进行变量选择和类预测,首次将和双指标重要性优先降维的思想应用到分类数据的变量选择中来.文章首先采用假设检验的方法,验证了Robert Tibshirani文献[15]中提出的原假设是不合理的,并针对这些不合理之处进行修改提出了新的统计量(94)6),进而结合该统计量的统计意义和Sure Independent Screening()思想,针对不同的数据类型,不同的样本量提出了新的变量选择模型、、,由于思想忽略了变量间的相关性这一缺点,所以又进一步在之前提出的模型基础上加入重要性优先思想创造了双指标重要性优先降维方法().然后,选择支撑向量机((1),朴素贝叶斯法()和最近邻法()作为变量选择后的数据的分类器,利用错分率指标找到最佳的分类模型.最后,将上述模型分别应用到模拟数据和实际数据中,通过与秩和检验变量选择,快速筛选变量法的分类效果进行比较,证明了我们提出的模型的可行性与稳定性.
[Abstract]:The purpose of this paper is to select variables and predict classes for Leukemia 72 of micromatrix data. For the first time, the idea of priority reduction of the importance of sum and two indexes is applied to variable selection of classified data. In this paper, the hypothesis test method is used to verify that the original hypothesis proposed in Robert Tibshirani [15] is unreasonable, and a new statistic (94) 6 is proposed to correct these irrationality. Combined with the statistical significance of the statistic and the Sure Independent Screening () thought, a new variable selection model is proposed for different data types and different sample sizes. Therefore, the importance priority idea is added to the previous model to create a two-index importance priority dimensionality reduction method (). Then, support vector machine (1), naive Bayesian method () and nearest neighbor method () are selected as the classifiers of the data after variable selection, and the best classification model is found by using the error rate index. Finally, the above models are applied to the simulation data and the actual data respectively. The feasibility and stability of the proposed model are proved by comparing the classification effect of the method with the selection of rank sum test variables and the fast screening variable method.
【学位授予单位】：兰州大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：O212

【相似文献】