基于AdaBoost回归树的多目标预测算法的研究

发布时间：2018-10-19 08:04

【摘要】：在大数据时代,如何利用好数据潜在信息价值,成为商业巨头能否实现经济飞速发展的重要因素。准确的目标预测,对于国民经济的发展具有不可估量的意义。在商业、经济和其他领域中,预测分析日益依赖于数据挖掘建模、统计、分析得出。随着互联网金融进程的推进,目标预测问题已成为具有现实意义的课题。多目标回归预测作为数据挖掘"年轻"的新兴领域,它与多标签分类学习紧密相关。现阶段,分类学习发展更成熟、更广泛。多标签分类学习的最新研究成果激发了我们能否将其成功地应用于多目标回归领域的探索。事实上,多目标预测已经出现在很多领域,如预测车辆部件的风噪、股票预测和生态模型等。预测问题通常涉及到相同的输入变量同时预测多个目标变量。当目标变量为二进制时,预测任务被称为多标签分类;当目标变量为实值时,预测任务称为多目标预测。在本文,我们提出了两种新的多目标回归方法:多目标堆叠(Multi-Target Stacking)和集成回归链(Ensemble of Regressor Chains)。灵感来自两种流行的多标签分类方法。MTS和ERC在第一阶段的训练,都将采用基于回归树AdaBoost算法(ART)建立的单目标预测(Single-Target Regression)模型作为基准方法;在第二阶段的训练,MTS和ERC都通过额外加入第一阶段的目标预测值作为输入变量来扩展第二阶段的输入变量空间,以此建立多目标预测模型。两种方法利用目标变量之间的关系,不同的是,ERC除了考虑目标的依赖性关系还考虑了目标的顺序问题。此外,我们总结了 MTS和ERC两种方法的缺点,并且对算法进行修改,提出了相应的改进版本 MTS Corrected(MTSC)和 ERC Corrected(ERCC)。本文另一个重要的贡献是提供真实世界存在的12个不同领域的数据。实验中六种方法的对比是根据12个大量且多样化的数据集进行实验评估,实验结果表明,修改后的回归链ERCC算法在多目标预测问题中表现最好,其性能明显优于单目标预测ART和最先进的多目标随机森林(MORF),也好于本文中提出的其他方法MTS、MTSC、ERC。除此之外,结果也表明,修改版本MTSC和ERCC使MTS和ERC性能得到显著提高。
[Abstract]:In big data's time, how to make good use of the potential information value of data has become an important factor whether the business giant can realize the rapid economic development. Accurate target prediction is of inestimable significance to the development of national economy. In commercial, economic and other fields, predictive analysis is increasingly dependent on data mining modeling, statistics, and analysis. With the development of Internet finance, target prediction has become a practical topic. As a new field of data mining, multi-objective regression prediction is closely related to multi-label classification learning. At present, classified learning is more mature and more extensive. The latest research results of multi-label classification learning have inspired us to apply it successfully in the field of multi-objective regression. In fact, multi-objective prediction has been used in many fields, such as wind noise prediction of vehicle components, stock forecasting and ecological models. The prediction problem usually involves the same input variables and multiple target variables. When the target variable is binary, the prediction task is called multi-label classification, and when the target variable is real, the prediction task is called multi-objective prediction. In this paper, we propose two new multiobjective regression methods: multiobjective stacking (Multi-Target Stacking) and integrated regression chain (Ensemble of Regressor Chains). The inspiration comes from two popular multi-label classification methods. In the first stage of MTS and ERC training, the single objective prediction (Single-Target Regression) model based on the regression tree AdaBoost algorithm (ART) will be used as the benchmark method. In the second stage of training, both MTS and ERC extend the input variable space of the second stage by adding the target prediction value of the first stage as input variables, so as to establish a multi-objective prediction model. The two methods take advantage of the relationship between the target variables, except that ERC considers the order of the target in addition to the dependency of the target. In addition, we summarize the shortcomings of MTS and ERC, modify the algorithm and propose the corresponding improved versions of MTS Corrected (MTSC) and ERC Corrected (ERCC). Another important contribution of this paper is to provide data on 12 different realms of the real world. The experimental results show that the modified regression chain ERCC algorithm performs best in the multi-objective prediction problem. Its performance is obviously superior to that of single objective prediction ART and the most advanced multiobjective random forest (MORF), and also better than the other methods proposed in this paper, MTS,MTSC,ERC.. In addition, the results also show that the modified versions of MTSC and ERCC significantly improve the performance of MTS and ERC.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP18

【相似文献】