当前位置:主页 > 科技论文 > 软件论文 >

粒子群优化加权随机森林算法研究

发布时间:2018-06-14 00:15

  本文选题:随机森林 + 粒子群 ; 参考:《郑州大学》2017年硕士论文


【摘要】:随机森林(Random Forest,RF)算法是2001年由Breiman提出的一种分类模型。其本质是将Bagging的Bootstrap Aggregating算法和Ho的随机子空间(Random Subspace)算法结合起来,通过对多棵决策树分类结果采取投票选取机制,确定最终的分类结果。随机森林算法自提出之后,被广泛地运用于数据挖掘与分类问题,后来还有许多学者对模型做出了改进。随机森林是一种高效的分类算法,随机森林模型的优点在于它不需要样本的背景知识,不用做变量选择,拥有很高的噪声容忍度,因此可省略数据预处理的繁琐工作。但模型中的投票选取机制会导致一些训练精度较低的决策树也拥有相同的投票能力,从而降低投票准确度。而且随机森林模型中的决策树棵数及其它参数的选取通常对随机森林的最终分类结果也有较大的影响。针对那些训练精度不优、投票能力相对较差的决策树,本文通过对传统随机森林算法进行详细试验与分析,基本确定了传统随机森林算法性能不足的原因:随机森林投票选取机制会导致一些训练精度较低的决策树也拥有相同的投票能力,这对随机森林最终的分类结果准确率造成较大的影响。在分类的同时也可能会产生多个类别的最高票数相同而最终导致难以分类的现象,本文将此现象定义为“死局现象”。为解决低精度决策和高票数竞争带来的分类困难,本文以传统随机森林模型为基础提出一种精确度加权随机森林算法(Accuracy Weighted Random Forest,AWRF),即在投票时将每棵决策树乘以一个与其训练精度成正比的权重,针对参数难以选取的问题,采取粒子群算法对影响新模型的参数进行迭代优化,选取模型中包括的参数。同时设计相关仿真实验对比,通过Matlab软件对UCI数据库中6个标准数据集进行验证,最后用不同的算法对比新模型的优缺点。通过对比得出结论,表明了新模型在对此类数据分类时的优势。
[Abstract]:Random Forest Random (RFF) algorithm is a classification model proposed by Bizaran in 2001. Its essence is to combine bagging bootstrap Aggregating algorithm and Ho's random subspace algorithm to determine the final classification results by adopting the voting selection mechanism for the classification results of multiple decision trees. Since the stochastic forest algorithm was proposed, it has been widely used in data mining and classification problems, and many scholars have improved the model. Stochastic forest is an efficient classification algorithm. The advantage of stochastic forest model is that it does not need the background knowledge of sample, does not need to do variable selection, and has high noise tolerance, so it can omit the tedious work of data preprocessing. However, the voting selection mechanism in the model will result in some decision trees with low training accuracy have the same voting ability, thus reducing the voting accuracy. Moreover, the selection of the number of decision trees and other parameters in the stochastic forest model usually has a great influence on the final classification results of the stochastic forest. For those decision trees with poor training precision and relatively poor voting ability, this paper makes a detailed experiment and analysis of the traditional stochastic forest algorithm. The reason why the traditional stochastic forest algorithm has insufficient performance is basically determined: the mechanism of random forest voting will result in some decision trees with low training precision also having the same voting ability. This has a great influence on the accuracy of the final classification of the random forest. At the same time, it may produce the phenomenon that the highest number of votes in many categories is equal and it is difficult to classify. This phenomenon is defined as "death phenomenon" in this paper. In order to solve the classification difficulties caused by low precision decision making and high vote competition, Based on the traditional stochastic forest model, an accuracy weighted random forest algorithm is proposed in this paper, in which each decision tree is multiplied by a weight proportional to its training accuracy, and the parameters are difficult to select. Particle swarm optimization (PSO) is used to optimize the parameters that affect the new model, and the parameters included in the model are selected. At the same time, the relevant simulation experiments are designed to verify the six standard data sets in UCI database by Matlab software. Finally, the advantages and disadvantages of the new model are compared with different algorithms. By comparison, the advantages of the new model in classifying this kind of data are shown.
【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18;TP311.13

【参考文献】

相关期刊论文 前10条

1 王杰;李红文;;Particle Swarm Optimization with Directed Mutation[J];Journal of Donghua University(English Edition);2016年05期

2 黄宝莹;周臣清;黄玲玲;苏妙仪;;配对t检验法比较3种方法检测奶粉中金黄色葡萄球菌计数结果[J];中国乳品工业;2016年08期

3 潘峰;;基于C5.0决策树算法的考试结果预测研究[J];微型机与应用;2016年08期

4 王杰;蔡良健;高瑜;;一种基于决策树的多示例学习算法[J];郑州大学学报(理学版);2016年01期

5 杨飚;尚秀伟;;加权随机森林算法研究[J];微型机与应用;2016年03期

6 潘大胜;屈迟文;;一种改进ID3型决策树挖掘算法[J];华侨大学学报(自然科学版);2016年01期

7 王超学;张涛;马春森;;面向不平衡数据集的改进型SMOTE算法[J];计算机科学与探索;2014年06期

8 李欣海;;随机森林模型在分类与回归分析中的应用[J];应用昆虫学报;2013年04期

9 董师师;黄哲学;;随机森林理论浅析[J];集成技术;2013年01期

10 冯变英;张旭;张春枝;;关于t检验方差分析及多重比较的研究[J];太原师范学院学报(自然科学版);2012年04期

相关博士学位论文 前1条

1 张丽平;粒子群优化算法的理论及实践[D];浙江大学;2005年



本文编号:2016144

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2016144.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户207f7***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com