基于布谷鸟算法的特征选择研究
发布时间:2019-01-27 22:43
【摘要】:机器学习是目前最热门的领域之一,吸引了来自社会各界的研究和目光。它不仅已经在医疗、图像、文本等领域得到了应用,而且正在作为一支生力军保持着其旺盛发展的势头。机器学习是以数据为驱动的,而现实中的许多问题往往涉及到的可能的影响因素有很多,这就使得原始特征具有高维特性。这种特性对于机器学习来说,不仅训练及预测效率低,而且还可能对无关、冗余特征敏感,导致精度降低。特征选择是一种有效解决上述问题的方法。在本文中,将首先对特征预处理和特征选择的流程进行介绍,然后再从搜索策略和评价准则两个角度对特征选择进行分类,并对其中的方法进行阐述。选择支持向量机作为评价特征选择结果优劣的标准,对其中的原理、推导、核的概念还有泛化方法给出介绍。粒子群算法和布谷鸟算法可以作为一种随机策略应用在特征选择上,所以本文将对两种算法的原理、方法进行叙述,并根据具体需要引出它们的二进制版本。本文在对二进制布谷鸟算法的搜索行为进行分析以后,提出了一种新的二进制布谷鸟算法,在这种算法中会对搜索方向和搜索精度上进行优化。在此基础上提出了PSO-NCS算法,将改进的二进制布谷鸟算法与PSO算法结合,充分利用了PSO的收敛的聚集特性和CS的全局寻优能力,使两者优势互补,不仅能跳出局部最优,还能有更好的收敛精度。使用PSO-NCS算法对几种数据集进行实验,得到的结果略优于其他算法,被证明在特征选择问题上,是一种寻优能力更强、收敛速度更快的算法。
[Abstract]:Machine learning is one of the most popular fields, attracting research and attention from all walks of life. It not only has been applied in medical treatment, image, text and other fields, but also is maintaining its vigorous development as a new force. Machine learning is data-driven, but many problems in reality often involve a lot of possible factors, which makes the original features have high-dimensional characteristics. This characteristic is not only low in training and prediction efficiency, but also sensitive to irrelevant and redundant features for machine learning, resulting in a decrease in accuracy. Feature selection is an effective method to solve the above problems. In this paper, we first introduce the process of feature preprocessing and feature selection, and then classify feature selection from two aspects of search strategy and evaluation criterion, and explain the methods. Support vector machine (SVM) is chosen as the criterion to evaluate the result of feature selection. The principle, derivation, kernel concept and generalization method are introduced. Particle swarm optimization and cuckoo algorithm can be applied to feature selection as a random strategy, so this paper will describe the principle and methods of the two algorithms, and draw out their binary versions according to the specific needs. After analyzing the search behavior of binary cuckoo algorithm, a new binary cuckoo algorithm is proposed, in which the search direction and precision are optimized. On this basis, a new PSO-NCS algorithm is proposed, which combines the improved binary cuckoo algorithm with the PSO algorithm, and makes full use of the convergence characteristics of PSO and the global optimization ability of CS, which makes both of them complement each other and can not only jump out of the local optimum. Better convergence accuracy can also be achieved. The PSO-NCS algorithm is used to test several data sets, and the result is slightly better than other algorithms. It is proved that the algorithm is more efficient and convergent in feature selection.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
本文编号:2416775
[Abstract]:Machine learning is one of the most popular fields, attracting research and attention from all walks of life. It not only has been applied in medical treatment, image, text and other fields, but also is maintaining its vigorous development as a new force. Machine learning is data-driven, but many problems in reality often involve a lot of possible factors, which makes the original features have high-dimensional characteristics. This characteristic is not only low in training and prediction efficiency, but also sensitive to irrelevant and redundant features for machine learning, resulting in a decrease in accuracy. Feature selection is an effective method to solve the above problems. In this paper, we first introduce the process of feature preprocessing and feature selection, and then classify feature selection from two aspects of search strategy and evaluation criterion, and explain the methods. Support vector machine (SVM) is chosen as the criterion to evaluate the result of feature selection. The principle, derivation, kernel concept and generalization method are introduced. Particle swarm optimization and cuckoo algorithm can be applied to feature selection as a random strategy, so this paper will describe the principle and methods of the two algorithms, and draw out their binary versions according to the specific needs. After analyzing the search behavior of binary cuckoo algorithm, a new binary cuckoo algorithm is proposed, in which the search direction and precision are optimized. On this basis, a new PSO-NCS algorithm is proposed, which combines the improved binary cuckoo algorithm with the PSO algorithm, and makes full use of the convergence characteristics of PSO and the global optimization ability of CS, which makes both of them complement each other and can not only jump out of the local optimum. Better convergence accuracy can also be achieved. The PSO-NCS algorithm is used to test several data sets, and the result is slightly better than other algorithms. It is proved that the algorithm is more efficient and convergent in feature selection.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前7条
1 李东生;高杨;雍爱霞;;基于改进离散布谷鸟算法的干扰资源分配研究[J];电子与信息学报;2016年04期
2 张晶;吴虎胜;;改进二进制布谷鸟搜索算法求解多维背包问题[J];计算机应用;2015年01期
3 冯登科;阮奇;杜利敏;;二进制布谷鸟搜索算法[J];计算机应用;2013年06期
4 高潮;刘志雄;;基于轮盘赌编码和粒子群算法的并行机调度优化[J];机械制造;2010年06期
5 陈涛;张思发;;分支限界法求解实际TSP问题[J];计算机工程与设计;2009年10期
6 王辉;钱锋;;群体智能优化算法[J];化工自动化及仪表;2007年05期
7 常彦伟;王耀才;曹云峰;王致杰;;基于误差相关度学习样本选择[J];计算机工程与设计;2007年16期
相关博士学位论文 前2条
1 黄东山;特征选择及半监督分类方法研究[D];华中科技大学;2011年
2 刘建华;粒子群算法的基本理论及其改进研究[D];中南大学;2009年
相关硕士学位论文 前1条
1 黄继达;布谷鸟算法的改进及其应用研究[D];华中科技大学;2014年
,本文编号:2416775
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2416775.html