基于随机森林特征选择的贝叶斯分类模型及应用
发布时间:2018-10-16 14:26
【摘要】:贝叶斯分析方法是研究不确定性的一种方法,并用概率的大小来表示其不确定性,基于此方法建立的分类模型具有可解释性、准确率高等优点,目前在许多领域得到了广泛应用.而随着我国经济的快速发展,信用评估也逐渐成为当前值得关注的话题之一.针对信用评估数据的特点,本文提出了基于随机森林特征选择的贝叶斯分类模型,并选取UCI数据库中的German数据集进行实证分析,结果表明:基于随机森林特征选择的思想,不但使得贝叶斯分类模型的结构更加简单,而且其获得的分类效果更优.本文主要的工作和创新如下:(1)随机森林是一种能容忍噪声且稳定性较高的智能学习算法,基于此算法的特征选择可以进行特征变量筛选,删除其冗余不相关的特征属性,又考虑到具有良好分类效果的朴素贝叶斯模型,本文构建了基于随机森林特征选择的朴素贝叶斯分类模型(RF-NB).(2)在实际应用中,考虑到朴素贝叶斯的“独立性假设”往往不成立,为使模型更符合实际,树增强朴素贝叶斯模型可以更好的表示特征属性间存在的依赖关系,因此本文又构建了基于随机森林特征选择的树增强朴素贝叶斯分类模型(RF-TAN).(3)将基于随机森林特征选择的贝叶斯分类模型应用到German数据信用评估指导中去,用于验证所提出的RF-NB和RF-TAN分类模型的分类效果,并与未进行特征选择的NB模型和未进行特征选择的TAN模型进行实验对比.实验结果表明:RF-NB和RF-TAN模型的分类效果显然优于NB、TAN模型.
[Abstract]:Bayesian analysis method is a method to study uncertainty, and its uncertainty is represented by the size of probability. The classification model based on this method has the advantages of interpretability, high accuracy and so on. At present, it has been widely used in many fields. With the rapid development of China's economy, credit evaluation has gradually become one of the topics worth paying attention to. According to the characteristics of credit evaluation data, a Bayesian classification model based on stochastic forest feature selection is proposed in this paper, and the German data set in UCI database is selected for empirical analysis. The results show that: based on the idea of stochastic forest feature selection, Not only the structure of Bayesian classification model is simpler, but also the classification effect is better. The main works and innovations of this paper are as follows: (1) Random forest is an intelligent learning algorithm which can tolerate noise and is more stable. The feature selection based on this algorithm can be used to filter feature variables and delete its redundant and irrelevant feature attributes. Considering the naive Bayesian model with good classification effect, this paper constructs a naive Bayesian classification model based on stochastic forest feature selection (RF-NB). (2) in practical application, considering that the "independence hypothesis" of naive Bayes is often not valid. In order to make the model more realistic, the tree enhanced naive Bayes model can better represent the dependency between the feature attributes. Therefore, a tree enhanced naive Bayesian classification model (RF-TAN). (3) based on stochastic forest feature selection is constructed. The Bayesian classification model based on stochastic forest feature selection is applied to the guidance of German data credit evaluation. It is used to verify the classification effect of the proposed RF-NB and RF-TAN classification models, and compared with the NB model without feature selection and the TAN model without feature selection. The experimental results show that the classification effect of RF-NB and RF-TAN model is obviously better than that of NB,TAN model.
【学位授予单位】:华北水利水电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F224
本文编号:2274650
[Abstract]:Bayesian analysis method is a method to study uncertainty, and its uncertainty is represented by the size of probability. The classification model based on this method has the advantages of interpretability, high accuracy and so on. At present, it has been widely used in many fields. With the rapid development of China's economy, credit evaluation has gradually become one of the topics worth paying attention to. According to the characteristics of credit evaluation data, a Bayesian classification model based on stochastic forest feature selection is proposed in this paper, and the German data set in UCI database is selected for empirical analysis. The results show that: based on the idea of stochastic forest feature selection, Not only the structure of Bayesian classification model is simpler, but also the classification effect is better. The main works and innovations of this paper are as follows: (1) Random forest is an intelligent learning algorithm which can tolerate noise and is more stable. The feature selection based on this algorithm can be used to filter feature variables and delete its redundant and irrelevant feature attributes. Considering the naive Bayesian model with good classification effect, this paper constructs a naive Bayesian classification model based on stochastic forest feature selection (RF-NB). (2) in practical application, considering that the "independence hypothesis" of naive Bayes is often not valid. In order to make the model more realistic, the tree enhanced naive Bayes model can better represent the dependency between the feature attributes. Therefore, a tree enhanced naive Bayesian classification model (RF-TAN). (3) based on stochastic forest feature selection is constructed. The Bayesian classification model based on stochastic forest feature selection is applied to the guidance of German data credit evaluation. It is used to verify the classification effect of the proposed RF-NB and RF-TAN classification models, and compared with the NB model without feature selection and the TAN model without feature selection. The experimental results show that the classification effect of RF-NB and RF-TAN model is obviously better than that of NB,TAN model.
【学位授予单位】:华北水利水电大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F224
【参考文献】
相关期刊论文 前10条
1 叶晓枫;鲁亚会;;基于随机森林融合朴素贝叶斯的信用评估模型[J];数学的实践与认识;2017年02期
2 吴信东;何进;陆汝钤;郑南宁;;从大数据到大知识:HACE+BigKE[J];自动化学报;2016年07期
3 周美琴;陈诗旭;袁鼎荣;朱新华;;一种单位代价收益决策树剪枝算法[J];计算机工程与科学;2016年05期
4 李进;;基于随机森林算法的绿色信贷信用风险评估研究[J];金融理论与实践;2015年11期
5 赵煜;邵必林;边根庆;宋丹;;面向不平衡微博数据集的转发行为预测方法[J];计算机应用;2015年07期
6 肖进;刘敦虎;顾新;汪寿阳;;银行客户信用评估动态分类器集成选择模型[J];管理科学学报;2015年03期
7 刘敏;郎荣玲;曹永斌;;随机森林中树的数量[J];计算机工程与应用;2015年05期
8 苗红星;余建坤;;基于决策树的ID3算法和C4.5算法的比较[J];现代计算机(专业版);2014年15期
9 孟杰;;随机森林模型在财务失败预警中的应用[J];统计与决策;2014年04期
10 姚明海;赵连朋;刘维学;;基于特征选择的Bagging分类算法研究[J];计算机技术与发展;2014年04期
相关硕士学位论文 前1条
1 高金玲;基于Logistic回归的中小型企业信用评估模型研究[D];西北师范大学;2014年
,本文编号:2274650
本文链接:https://www.wllwen.com/jingjifazhanlunwen/2274650.html