结合属性关联度的决策树算法研究及应用
[Abstract]:In the new century, challenges and opportunities coexist, it can be said that the use and control of the mass of data is related to the future development of various industries. The in-depth exploration in big data's field can analyze the data model more macroscopically, discover the potential law and predict the future trend reasonably, so as to gain insight into more profound, effective and comprehensive information. The research of data mining related algorithms can be said to have both scientific research value and practical value. In this paper, based on the classical decision tree C4.5 algorithm, the Apriori association rule algorithm is used to combine the correlation degree among the attributes of data source into the calculation of the later decision tree. The traditional C4.5 algorithm only considers the correlation between the attributes to be tested and the class attributes when selecting the split attributes, and ignores the degree of association between the non-class attributes, which determines the degree of redundancy between the attributes. In order to reduce the influence of redundancy, this paper uses the idea of information gain to measure the measured attributes and other non-class attributes, and adds them to the original algorithm to generate more reliable splitting attributes. In addition, in the process of constructing the decision tree model, in view of the lack of information, the paper also uses the association rule Apriori algorithm to generate a series of strong rules, and according to the proposed new attribute selection criteria, the paper also uses the association rules Apriori algorithm to generate a series of strong rules. From these strong rules, new attributes are screened out and added to the original set of attributes to expand the amount of information, so that the prediction accuracy of C4.5 algorithm can be improved. The information contained in a sample is often diverse and rich. By using the traditional decision tree algorithm, we can know the degree of association of "attribute-classification", but the degree of correlation between attributes is a kind of horizontal analysis of data set. By analyzing the relationship between the two attributes, we can make our analysis framework more stereoscopic and the results more usable. Finally, this paper applies this method to practical examples, using historical data to find out the primary and secondary factors that affect the entrance of gymnasium customers, establish the model and forecast the relevant attributes, and find those interested and valuable customer groups. The practical value of the algorithm in the actual scene is further explained.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
相关期刊论文 前10条
1 程斐斐;王子牛;侯立铎;;决策树算法在Weka平台上的数据挖掘应用[J];微型电脑应用;2015年06期
2 周发超;王志坚;叶枫;邓玲玲;;关联规则挖掘算法Apriori的研究改进[J];计算机科学与探索;2015年09期
3 翟霞;刘政宇;;关联规则中Apriori算法的创新研究[J];数字技术与应用;2014年04期
4 周剑峰;阳爱民;刘吉财;;基于改进的C4.5算法的网络流量分类方法[J];计算机工程与应用;2012年05期
5 谢妞妞;刘於勋;;决策树属性选择标准的改进[J];计算机工程与应用;2010年34期
6 常少春;;基于Apriori有效关联规则及其兴趣度的研究[J];科学技术与工程;2010年28期
7 刘先花;;浅谈数据挖掘技术及其研究现状[J];现代情报;2010年03期
8 黄爱辉;;决策树C4.5算法的改进及应用[J];科学技术与工程;2009年01期
9 李楠;段隆振;陈萌;;决策树C4.5算法在数据挖掘中的分析及其应用[J];计算机与现代化;2008年12期
10 王晓国,黄韶坤,朱炜,李启炎;应用C4.5算法构造客户分类决策树的方法[J];计算机工程;2003年14期
相关硕士学位论文 前7条
1 付利红;关联规则挖掘算法在web日志挖掘中的应用研究[D];山东大学;2011年
2 周贤;基于数据挖掘的就业管理信息系统的研究[D];湖南大学;2011年
3 戴支立;分布式环境下关联规则挖掘的隐私保护方法研究[D];南京邮电大学;2011年
4 李健平;决策树技术在军事训练成绩中的分析研究[D];昆明理工大学;2010年
5 冯宏亮;数据挖掘中若干关键算法的研究[D];西安科技大学;2010年
6 吴喜萍;基于关联规则数据挖掘技术的高校学生学习成绩分析[D];西南交通大学;2010年
7 黄杰;数据挖掘在军队人才培养上的应用研究[D];重庆大学;2005年
,本文编号:2442507
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2442507.html