基于特征抽取和分步回归算法的资金流入流出预测模型

发布时间：2018-05-25 06:41

本文选题：资金流预测 + 特征抽取　；参考：《中国科学技术大学》2017年硕士论文

【摘要】：商业公司的金融平台往往拥有千万乃至上亿位服务会员,公司的金融业务场景每天必然会涉及大量的资金流入和流出,面对如此庞大的金融数据,资金管理压力会非常大。在既保证资金流动性风险最小,又满足日常业务运转的情况下,精准地预测资金的流入流出情况显得尤为重要。但金融数据的变动往往受社会,政治,经济,重大事件等多方面因素影响,数据趋势不稳定而且包含多噪声,给资金流量的预测带来了困难。本文以金融平台用户的资金流量预测为研究背景,旨在构建一个准确、有效的资金流入流出的预测模型,以最大程度上贴近资金流量的真实值,便于资金管理。本文的主要研究内容与成果如下:1.本文针对资金流入流出数据集初始特征不明显的特点,利用特征抽取方法挖掘出相关特征,并采取特征选择策略选出最优特征子集。主要是从时间、用户、利率三个不同角度构造与目标值相关的多个特征,再利用皮埃尔相关系数法进行初步筛选出最为相关的特征。随后用特征选择策略进一步筛选,剔除次相关特征和冗余特征,形成最优特征子集。实验结果表明,特征抽取方法所选的特征子集对不同回归算法的预测效果的影响不同,在最终申购值的12列特征、赎回值的10列特征时达到最佳子集,对大多数不同的回归算法可以得到较好的预测效果。因此可以确定此特征子集作为下一步算法预测的最优特征子集。2.为解决数据集不稳定,多噪声的问题,采用分步回归算法对特征子集进行训练学习,提高回归预测准确率。本文提出的是两步特征预测方法,即单步特征预测是运用灰度预测、时间序列算法对未来时间的未知特征进行预测,将预测的特征添加到未来时段的已知特征子集中。随后结合BP神经网络对所有特征集合进行训练建模,得到最终的预测结果。将该算法与集成学习方法对比,运用基于Adaboost的梯度提升回归树和基于Bagging的随机森林回归算法分别对数据集进行训练。由实验结果分析,发现两步特征预测算法较其他算法减小了预测误差,部分算法比集成学习方法的预测效果更佳。3.本文对离散类型的特征子集进行one-hot稀疏编码,考虑因子分解机算法在处理稀疏数据集时作用显著,运用该算法进行回归预测。由于因子分解机算法可以较好地表达变量间的相互作用,相当于在原有特征变量的基础上还增加了二次交叉特征,更好地刻画数据集的特点。此外,因子分解机的算法复杂度不太高,且运行效率高。实验表明,因子分解机算法在一定程度上可以提高资金流入流出量的预测准确率。
[Abstract]:The financial platform of a commercial company often has tens of millions or even hundreds of millions of service members. The financial business scenario of the company is bound to involve a large amount of capital inflow and outflow every day. In the face of such huge financial data, the pressure of capital management will be very great. It is very important to predict the inflow and outflow of funds accurately under the condition that the liquidity risk is minimum and the daily business operation is satisfied. However, the change of financial data is often affected by social, political, economic, major events and other factors. The trend of data is unstable and contains many noises, which makes it difficult to predict the flow of funds. The purpose of this paper is to construct an accurate and effective forecasting model of capital inflow and outflow in order to get close to the real value of capital flow to the greatest extent and to facilitate capital management. The main contents and results of this paper are as follows: 1. In view of the fact that the initial features of the inflow and outflow data sets are not obvious, this paper uses the feature extraction method to find out the relevant features, and adopts the feature selection strategy to select the optimal feature subset. Several features related to the target value are constructed from three different angles of time, user and interest rate, and the most relevant features are preliminarily selected by using Pierre correlation coefficient method. Then the feature selection strategy is used to further screen the subcorrelation feature and redundant feature to form the optimal feature subset. The experimental results show that the feature subset selected by the feature extraction method has different effects on the prediction effect of different regression algorithms, and reaches the best subset when the final purchase value is 12 column feature, the redemption value is 10 column feature. Good prediction results can be obtained for most different regression algorithms. Therefore, this feature subset can be determined as the optimal feature subset. 2. In order to solve the problem of unstable and noisy data sets, stepwise regression algorithm is used to train and learn feature subsets to improve the accuracy of regression prediction. In this paper, a two-step feature prediction method is proposed, that is, single-step feature prediction is based on gray prediction, time series algorithm is used to predict unknown features of future time, and the predicted features are added to the subset of known features in the future period. Then the BP neural network is used to train and model all the feature sets, and the final prediction results are obtained. The algorithm is compared with the ensemble learning method, and the data sets are trained by using the gradient lifting regression tree based on Adaboost and the stochastic forest regression algorithm based on Bagging. By analyzing the experimental results, it is found that the two-step feature prediction algorithm reduces the prediction error compared with other algorithms, and some of the algorithms have better prediction effect than the integrated learning method. In this paper, one-hot sparse coding for discrete feature subsets is carried out, and the factor factoring algorithm is used to predict the sparse data sets. Because the factoring machine algorithm can better express the interaction between variables, it is equivalent to the addition of quadratic cross features on the basis of the original feature variables, which can better describe the characteristics of the data set. In addition, the algorithm complexity of factoring machine is not too high, and the running efficiency is high. The experimental results show that the factor factoring algorithm can improve the accuracy of the forecast of the inflow and outflow of funds to some extent.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：F224;F832.39;F724.6

【参考文献】