分类器选择集成及在基因数据分析中的应用
发布时间:2018-06-03 04:08
本文选题:集成学习 + 选择性集成 ; 参考:《大连理工大学》2016年硕士论文
【摘要】:集成学习方法是指对同一个问题,集成不同分类器的分类结果,以得到更好的分类性能。但并非集成学习方法中的每一个分类器对集成结果都是有效的,选择性集成就是尝试选择出效果较好的分类器子集,提高整体的性能,并且减少集成的内存需求和计算花费。本文提出了两种选择性集成方法,一种是基于kappa系数的静态选择性集成,另一种是基于萤火虫算法的动态选择性集成。其中,静态选择方法适用于数据量较小的情况,动态选择适用于数据量较大的情况。在进行分类器选择之前,先通过基于数据扰动的排序聚合算法选择出与分类相关的基因,通过近邻传播聚类算法对基因进行分组,通过随机的从每组中选择一个基因用于构成最后的基因子集。这样得到的基因子集既与分类相关,基因之间的关联性也不高。在得到基分类器之后,第一种方法是利用kappa阈值筛选出大于阈值的分类器,第二种方法是利用类似于聚类的方法选择出精确度较高的且相互之间的差异也较大的分类器。在5个基因数据集上的实验结果表明,本文的两种方法的精确度高于对比方法和经典方法。第一种方法在数据量较小时能够快速的选择出合适的分类器子集,第二种方法在数据量较大的时候能够节省更多的时间,并且也能够获得较好的分类结果。
[Abstract]:The ensemble learning method is to integrate the classification results of different classifiers for the same problem in order to obtain better classification performance. However, not every classifier in the ensemble learning method is effective for the ensemble result. Selective ensemble is to try to select a subset of classifiers with better performance, improve the overall performance, and reduce the memory requirement and computational cost of the integration. In this paper, two selective ensemble methods are proposed, one is static selective integration based on kappa coefficient, the other is dynamic selective integration based on firefly algorithm. Among them, the static selection method is suitable for the case of small amount of data, and the dynamic selection method is suitable for the case of large amount of data. Before classifier selection, the genes related to classification are selected by the sort aggregation algorithm based on data disturbance, and the genes are grouped by the nearest neighbor propagation clustering algorithm. One gene was randomly selected from each group to form the final subset of genes. The resulting subsets of genes are not highly correlated with classification, nor are they highly correlated with each other. After getting the base classifier, the first method is to use the kappa threshold to filter out the classifier larger than the threshold. The second method is to select the classifier with higher accuracy and greater difference from each other by using a similar clustering method. The experimental results on five gene datasets show that the accuracy of the two methods is higher than that of the contrast method and the classical method. The first method can quickly select the appropriate classifier subset when the amount of data is small. The second method can save more time and obtain better classification results when the amount of data is large.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP18
【参考文献】
相关期刊论文 前2条
1 黎成;;新型元启发式蝙蝠算法[J];电脑知识与技术;2010年23期
2 傅强,胡上序,赵胜颖;Clustering-based selective neural network ensemble[J];Journal of Zhejiang University Science A(Science in Engineering);2005年05期
,本文编号:1971410
本文链接:https://www.wllwen.com/kejilunwen/jiyingongcheng/1971410.html
最近更新
教材专著