基于SVDD的特征选择方法研究及其应用
发布时间:2018-03-15 21:49
本文选题:支持向量数据描述 切入点:特征选择 出处:《苏州大学》2015年硕士论文 论文类型:学位论文
【摘要】:在癌症分类问题中,基因表达数据的维数成千上万,并且某些特征之间存在相关性。因而如何从大量的高维基因表达数据中快速提取出具有有用信息的低维数据越来越受到研究人员的关注。本文深入研究了基于支持向量数据描述(Support Vector Data Description,SVDD)的特征选择方法,并将其应用到基因表达数据的选择中,剔除不相关的、冗余基因,保留包含信息量多的基因,从而提高癌症的分类性能。本文的创新之处在于:提出了一种基于SVDD模型的快速特征选择算法。基于支持向量数据描述的特征选择方法已经被提出,但是其计算量较大,特征选择时间过长。针对此问题,本文提出了一种基于支持向量数据描述的快速特征选择算法。新方法的特征选择是通过对SVDD形成的超球体球心方向上的能量排序来实现,并且采用了递归特征消除方式来逐渐剔除掉冗余特征。在Leukemia和Colon Tumor数据集上的实验结果表明,新方法能够快速地进行特征选择,且所选择特征对后续的癌症分类是有效的。提出了基于多SVDD模型的快速特征选择算法。上述提到的基于SVDD的特征选择算法,仅对一类数据进行训练,忽略了其他类别的数据,只适用于一类或者两类数据。然而,实际生活中多类数据更为常见。针对多分类问题,本文提出了一种基于多SVDD的快速特征选择算法。该算法对每类数据建立一个SVDD特征选择模型,因而可以选择出多个特征子集,最后将所选择的特征子集融合起来,得到更有效的特征子集。在两个两类癌症数据和三个多类癌症数据集上的实验验证了本文方法可以选择更具有辨别力的特征子集。
[Abstract]:In cancer classification, there are thousands of dimensions of gene expression data. Therefore, how to quickly extract low-dimensional data with useful information from a large number of high-dimensional gene expression data has attracted more and more attention of researchers. The feature selection method of support Vector Data description (SVD) is described by holding vector data. And apply it to the selection of gene expression data, remove irrelevant, redundant genes, and retain genes that contain a lot of information. In order to improve the classification performance of cancer, this paper proposes a fast feature selection algorithm based on SVDD model. The feature selection method based on support vector data description has been proposed, but its computation is large. The feature selection time is too long. In order to solve this problem, a fast feature selection algorithm based on support vector data description is proposed in this paper. The feature selection of the new method is realized by sorting the energy in the direction of the spherical center of the hypersphere formed by SVDD. The recursive feature elimination method is used to eliminate redundant features gradually. Experimental results on Leukemia and Colon Tumor datasets show that the new method can be used to select features quickly. And the selected features are effective for the subsequent cancer classification. A fast feature selection algorithm based on multiple SVDD model is proposed. The feature selection algorithm based on SVDD mentioned above only trains one kind of data and neglects other kinds of data. Only for one or two types of data. However, multi-class data is more common in real life. In this paper, a fast feature selection algorithm based on multiple SVDD is proposed. This algorithm establishes a SVDD feature selection model for each class of data, so that multiple feature subsets can be selected. Finally, the selected feature subsets are fused together. Experimental results on two types of cancer data and three sets of multi-class cancer data show that the proposed method can select more discriminative feature subsets.
【学位授予单位】:苏州大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:R730.4;TP18
【参考文献】
相关期刊论文 前1条
1 代琨;于宏毅;李青;;一种基于支持向量机的特征选择算法[J];模式识别与人工智能;2014年05期
,本文编号:1616956
本文链接:https://www.wllwen.com/yixuelunwen/zlx/1616956.html