当前位置:主页 > 管理论文 > 营销论文 >

基于ProGEP的代价敏感分类算法研究

发布时间:2018-05-10 21:42

  本文选题:数据挖掘 + 代价敏感 ; 参考:《安徽财经大学》2015年硕士论文


【摘要】:近年来数据挖掘技术被广泛应用在市场营销、商业管理、企业危机管理、产品制造和Internet等方面。目前全世界计算机存储的未使用的海量数据还在快速增长,数据类型和结构也愈发复杂,这对降低挖掘成本,提高算法效能均带来严峻的挑战。因此,改进挖掘算法流程、提高算法运行效率对于高效取得较为满意的挖掘结果有着重要意义。 本文就对数据挖掘中常用的遗传算法衍生的一种新算法——基因表达式编程展开研究并作出相关改进,提出并设计ProGEP算法,并将该算法应用于代价敏感分类问题,设计并实现了CSC-ProGEP算法。主要工作有以下四个方面: 1.综述了国内外GEP及代价敏感学习算法的研究现状;概述了GEP算法的构成及流程;简述了目前几种常用的代价敏感分类算法。 2.改进GEP算法并提出ProGEP算法。针对基本GEP算法重复遍历表达式树的染色体评估方法效率低下的不足,在研究目前流行的改进算法——基因阅读运算器的改进思想后,提出了逆波兰表达式——堆栈法评估(RPE_SD),通过后续遍历一次表达式树获得逆波兰表达式,采用重复读取线性的堆栈结构进行存储和计算,实现染色体评估效率的提高;其次,就基本GEP未给定具体常数参数的生成方法和完全随机化的初始种群生成方式指出给定合理的常数参数的必要性和向种群插入优势个体对进化初期的促进作用,提出粗糙的多元线性回归初始化——自适应修正常数(RMLR_AC),该算法将多元回归获得的全变量系数参数作为常数变量引入染色体的基因表达式结构中,并通过进化过程实现系数常数的修正;再次,观察发现基本GEP在进化种群中存在染色体个体基因型相同的现象,定义了重复染色体及隐重复染色体的概念,研究指出该现象的产生原因及对基因片多样性、进化效率的不利影响和对种群其他个体的恶性同化作用,提出消除(隐)重复个体(DSC)算法,并通过创建种群副本进行二次选择(CPCSC)来改进GEP选择流程;最后,再次观察种群结构特征,指出并定义了GEP的同族染色体和种族断层现象,为避免该现象存在导致的基因片在全种族范围内的交流受阻及进化结果向局部最优解收敛,提出基于线程机制的周期性种群多样性分化(TM_PDI)改进进化流程,并给出对主线程的种群进行排序后再分段克隆,补充随机化个体(SHS_RRI)的初始化子线程种群算法。融合基本GEP算法和上述的四点改进,本文提出并描述了ProGEP算法。 3.将ProGEP应用于代价敏感分类问题。通过构建代价敏感矩阵并融入ProGEP的适应度函数,获得CSC-ProGEP算法。在描述该算法流程的基础上,本文给出了对稀有类分类效果的评判方法。 4.实验环境的构建与算法的验证及应用。由于对基本GEP的基因评估算法、选择流程、进化流程等方面均作出修改,为了能方便地描述算法细节,灵活地进行实验结果的统计计算,本文基于Microsoft Visual Studio2012,使用C#语言,采用面向对象的设计方法实现了GEP基本模型结构以及ProGEP相关改进。实验验证了ProGEP的算法性能及CSC-ProGEP的应用效果。为独立观察每个改进带来的提升,将四个改进分步引入GEP,多次实验后观察比对引入前后的效果。在验证ProGEP的有效性之后,选取五组UCI数据集,采用10-折交叉验证法进行CSC实验,并将获得的分类器和其他分类算法训练的分类器比较,实验表明CSC-ProGEP在解决代价敏感分类问题中,相对于传统分类算法(C4.5、BN、BP)和代价敏感分类算法(AdaCost),在保证了分类准确率的同时也获得了更高的稀有类召回率及精度。 本文所做研究的意义,一方面是对GEP算法理论的完善和提高,对染色体评估效率、种群结构和进化流程的改进丰富了其理论研究;另一方面推广了GEP实际应用,通过CSC-ProGEP的挖掘实验,验证了ProGEP算法,这对于预测患病与否及预防欺诈性客户等稀有类挖掘应用具有一定的指导意义。
[Abstract]:In recent years , data mining has been widely used in marketing , business management , enterprise crisis management , product manufacture and Internet .

In this paper , we study and design a new algorithm _ gene expression program derived from genetic algorithms commonly used in data mining , propose and design the ProGEP algorithm , and apply the algorithm to the cost - sensitive classification problem , and design and implement the CSC - ProGEP algorithm . The main work has the following four aspects :

1 . The research status of GEP and cost - sensitive learning algorithms at home and abroad is reviewed .
The constitution and flow of GEP algorithm are summarized .
Several common cost - sensitive classification algorithms are briefly described .

2 . The GEP algorithm is improved and ProGEP algorithm is proposed . After studying the improvement thought of the current improved algorithm _ gene reading operator , an inverse Polish expression _ stack method evaluation ( RPE _ SD ) is proposed , and then the inverse Polish expression is obtained by traversing an expression tree .
Secondly , a rough multivariate linear regression initialization _ adaptive correction constant ( RMLR _ AC ) is proposed on the basis of the generation method and initial population generation method of the basic GEP not given specific constant parameter and the initial population generation mode of complete randomization , and a rough multivariate linear regression initialization _ adaptive correction constant ( RMLR _ AC ) is proposed . The algorithm uses the variable coefficient parameter obtained by multiple regression as a constant variable into the gene expression structure of the chromosome , and realizes the correction of the coefficient constant through the evolution process ;
Thirdly , we find out that the basic GEP has the same genotype in the evolutionary population , and defines the concept of repeated chromosomes and recessive repeat chromosomes . It is pointed out that the causes of this phenomenon and the adverse effects on gene chip diversity , evolutionary efficiency and the malignant assimilation of other individuals of the population are pointed out , and the elimination ( implicit ) repetitive individual ( DSC ) algorithm is proposed , and the GEP selection process is improved by creating a population copy for secondary selection ( CPCSC ) ;
Finally , we observed the structural features of population , pointed out and defined the phenomenon of homogeneous chromosome and racial fault of GEP . In order to avoid the blocking and evolution of gene fragments caused by this phenomenon , we proposed the improved evolutionary process of periodic population diversity differentiation ( TM _ PDI ) based on thread mechanism , and proposed the initialization sub - thread population algorithm based on thread mechanism . The basic GEP algorithm and the four - point improvement described above are given , and the ProGEP algorithm is proposed and described .

3 . ProGEP is applied to the cost - sensitive classification problem . By constructing the cost - sensitive matrix and integrating the fitness function of ProGEP , the CSC - ProGEP algorithm is obtained . On the basis of describing the algorithm flow , this paper presents a method for judging the rare class classification effect .

4 . The construction of the experimental environment and the application of the algorithm are introduced . In order to easily describe the algorithm details and to flexibly carry out the statistical calculation of the experimental results , this paper introduces the algorithm performance of the ProGEP and the improvement of the application of the ProGEP . After verifying the validity of the ProGEP , the results show that the CSC - ProGEP is used to solve the cost - sensitive classification problem , and the results show that CSC - ProGEP has a higher accuracy of recall and recall with respect to the traditional classification algorithm ( C4.5 , BN , BP ) and cost - sensitive classification algorithm ( AdaCost ) .

The significance of this research is to improve and improve the theory of GEP algorithm , and enrich its theoretical research on the improvement of chromosome assessment efficiency , population structure and evolutionary process .
On the other hand , the application of GEP is generalized . Through the mining experiment of CSC - ProGEP , the ProGEP algorithm is validated , which has certain guiding significance for predicting the disease and preventing fraudulent customers .

【学位授予单位】:安徽财经大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP181;TP311.13

【参考文献】

相关期刊论文 前10条

1 谢方军,唐常杰,元昌安,左R,

本文编号:1871016


资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/yingxiaoguanlilunwen/1871016.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户201ad***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com