基于进化算法的特征选择研究
发布时间:2018-06-12 02:30
本文选题:特征选择 + 遗传算法 ; 参考:《河北大学》2017年硕士论文
【摘要】:特征选择是指从初始特征全集中,依据既定规则筛选出特征子集的过程。通过剔除冗余特征,以达到降低算法复杂度和提高算法性能的目的。特征选择是解决维数灾难问题的有效手段,在机器学习中扮演着重要角色。研究特征选择具有重要的理论及应用价值,特别是对于大数据时代的机器学习。本文在离散值特征选择问题上,提出了两种不同的基于进化算法的特征选择方法。第一种方法用相对分类信息熵作为适应度函数,度量特征子集的重要性,理论证明了这种度量的可行性,用进化算法(遗传算法、粒子群算法)寻找最优特征子集。第二种方法和第一种方法类似,不同的是用不一致率作为适应度函数,度量特征子集的重要性。本文通过比较研究这两种方法,得到了如下结论:(a)当采用相同的适应度函数时,用粒子群搜索最优特征子集与用遗传算法搜索最优特征子集相比,前者在测试精度和收敛速度两方面均优于后者。(b)当采用不同的适应度函数时,选择相对分类信息熵作为适应度函数的进化特征选择方法要优于选择不一致率作为适应度函数的进化特征选择方法。另外,论文还研究了本文提出的算法在连续值情况下的推广。本文提出的算法具有三个特点:(1)简单且易于实现;(2)特征子集表示能力较强;(3)具有好的语义可解释性。
[Abstract]:Feature selection refers to the process of screening feature subset according to the established rules in the complete set of features. By eliminating redundant features, it can reduce the complexity of the algorithm and improve the performance of the algorithm. Feature selection is an effective means to solve the problem of dimension disaster and plays an important role in machine learning. The important theory and application value, especially for machine learning in the era of large data. In this paper, two different feature selection methods based on evolutionary algorithms are proposed in the selection of discrete value features. The first method uses the relative information entropy as the fitness function to measure the importance of the feature subset, and the theory proves this degree. The feasibility of using the evolutionary algorithm (genetic algorithm, particle swarm optimization) to find the best special subset. The second method is similar to the first one, and the difference is to use the inconsistency as the fitness function to measure the importance of the feature subset. In this paper, the following conclusions are obtained by comparing the two methods: (a) when the same fitness is used Compared with the genetic algorithm, the former is superior to the latter in two aspects of testing precision and convergence speed. (b) when different fitness functions are used, the selection method of selecting the relative entropy as the fitness function is better than the choice of the inconsistency rate. As an evolutionary feature selection method for fitness function, the paper also studies the generalization of the algorithm proposed in this paper in the case of continuous values. The proposed algorithm has three characteristics: (1) simple and easy to implement; (2) the ability to express the feature subset is stronger; (3) it has good semantic interpretability.
【学位授予单位】:河北大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前7条
1 薛章鹰;刘兴权;;结合ReliefF、GA和SVM的面向对象建筑物目标识别特征选择方法[J];测绘工程;2017年02期
2 董红斌;滕旭阳;杨雪;;一种基于关联信息熵度量的特征选择方法[J];计算机研究与发展;2016年08期
3 叶婷婷;刘明霞;张道强;;基于有效距离的多模态特征选择[J];模式识别与人工智能;2016年07期
4 刘全金;赵志敏;李颖新;俞晓磊;;基于近邻信息和PSO算法的集成特征选取[J];电子学报;2016年04期
5 周涛;陆惠玲;张艳宁;马苗;;基于Rough Set的高维特征选择混合遗传算法研究[J];南京大学学报(自然科学);2015年04期
6 崔文岩;孟相如;李纪真;王明鸣;陈天平;王坤;;基于粗糙集粒子群支持向量机的特征选择方法[J];微电子学与计算机;2015年01期
7 黄炜;黄志华;;一种基于遗传算法和SVM的特征选择[J];计算机技术与发展;2010年06期
,本文编号:2007868
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2007868.html