基于特征抽取和分步回归算法的资金流入流出预测模型
本文选题:资金流预测 + 特征抽取 ; 参考:《中国科学技术大学》2017年硕士论文
【摘要】:商业公司的金融平台往往拥有千万乃至上亿位服务会员,公司的金融业务场景每天必然会涉及大量的资金流入和流出,面对如此庞大的金融数据,资金管理压力会非常大。在既保证资金流动性风险最小,又满足日常业务运转的情况下,精准地预测资金的流入流出情况显得尤为重要。但金融数据的变动往往受社会,政治,经济,重大事件等多方面因素影响,数据趋势不稳定而且包含多噪声,给资金流量的预测带来了困难。本文以金融平台用户的资金流量预测为研究背景,旨在构建一个准确、有效的资金流入流出的预测模型,以最大程度上贴近资金流量的真实值,便于资金管理。本文的主要研究内容与成果如下:1.本文针对资金流入流出数据集初始特征不明显的特点,利用特征抽取方法挖掘出相关特征,并采取特征选择策略选出最优特征子集。主要是从时间、用户、利率三个不同角度构造与目标值相关的多个特征,再利用皮埃尔相关系数法进行初步筛选出最为相关的特征。随后用特征选择策略进一步筛选,剔除次相关特征和冗余特征,形成最优特征子集。实验结果表明,特征抽取方法所选的特征子集对不同回归算法的预测效果的影响不同,在最终申购值的12列特征、赎回值的10列特征时达到最佳子集,对大多数不同的回归算法可以得到较好的预测效果。因此可以确定此特征子集作为下一步算法预测的最优特征子集。2.为解决数据集不稳定,多噪声的问题,采用分步回归算法对特征子集进行训练学习,提高回归预测准确率。本文提出的是两步特征预测方法,即单步特征预测是运用灰度预测、时间序列算法对未来时间的未知特征进行预测,将预测的特征添加到未来时段的已知特征子集中。随后结合BP神经网络对所有特征集合进行训练建模,得到最终的预测结果。将该算法与集成学习方法对比,运用基于Adaboost的梯度提升回归树和基于Bagging的随机森林回归算法分别对数据集进行训练。由实验结果分析,发现两步特征预测算法较其他算法减小了预测误差,部分算法比集成学习方法的预测效果更佳。3.本文对离散类型的特征子集进行one-hot稀疏编码,考虑因子分解机算法在处理稀疏数据集时作用显著,运用该算法进行回归预测。由于因子分解机算法可以较好地表达变量间的相互作用,相当于在原有特征变量的基础上还增加了二次交叉特征,更好地刻画数据集的特点。此外,因子分解机的算法复杂度不太高,且运行效率高。实验表明,因子分解机算法在一定程度上可以提高资金流入流出量的预测准确率。
[Abstract]:The financial platform of a commercial company often has tens of millions or even hundreds of millions of service members. The financial business scenario of the company is bound to involve a large amount of capital inflow and outflow every day. In the face of such huge financial data, the pressure of capital management will be very great. It is very important to predict the inflow and outflow of funds accurately under the condition that the liquidity risk is minimum and the daily business operation is satisfied. However, the change of financial data is often affected by social, political, economic, major events and other factors. The trend of data is unstable and contains many noises, which makes it difficult to predict the flow of funds. The purpose of this paper is to construct an accurate and effective forecasting model of capital inflow and outflow in order to get close to the real value of capital flow to the greatest extent and to facilitate capital management. The main contents and results of this paper are as follows: 1. In view of the fact that the initial features of the inflow and outflow data sets are not obvious, this paper uses the feature extraction method to find out the relevant features, and adopts the feature selection strategy to select the optimal feature subset. Several features related to the target value are constructed from three different angles of time, user and interest rate, and the most relevant features are preliminarily selected by using Pierre correlation coefficient method. Then the feature selection strategy is used to further screen the subcorrelation feature and redundant feature to form the optimal feature subset. The experimental results show that the feature subset selected by the feature extraction method has different effects on the prediction effect of different regression algorithms, and reaches the best subset when the final purchase value is 12 column feature, the redemption value is 10 column feature. Good prediction results can be obtained for most different regression algorithms. Therefore, this feature subset can be determined as the optimal feature subset. 2. In order to solve the problem of unstable and noisy data sets, stepwise regression algorithm is used to train and learn feature subsets to improve the accuracy of regression prediction. In this paper, a two-step feature prediction method is proposed, that is, single-step feature prediction is based on gray prediction, time series algorithm is used to predict unknown features of future time, and the predicted features are added to the subset of known features in the future period. Then the BP neural network is used to train and model all the feature sets, and the final prediction results are obtained. The algorithm is compared with the ensemble learning method, and the data sets are trained by using the gradient lifting regression tree based on Adaboost and the stochastic forest regression algorithm based on Bagging. By analyzing the experimental results, it is found that the two-step feature prediction algorithm reduces the prediction error compared with other algorithms, and some of the algorithms have better prediction effect than the integrated learning method. In this paper, one-hot sparse coding for discrete feature subsets is carried out, and the factor factoring algorithm is used to predict the sparse data sets. Because the factoring machine algorithm can better express the interaction between variables, it is equivalent to the addition of quadratic cross features on the basis of the original feature variables, which can better describe the characteristics of the data set. In addition, the algorithm complexity of factoring machine is not too high, and the running efficiency is high. The experimental results show that the factor factoring algorithm can improve the accuracy of the forecast of the inflow and outflow of funds to some extent.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F224;F832.39;F724.6
【参考文献】
相关期刊论文 前10条
1 徐子伟;王传启;王鹏;黄海;;基于分步特征提取和组合分类器的电信客户流失预测模型[J];微型机与应用;2016年13期
2 王子豪;徐桂琼;;基于高阶偏差的因子分解机推荐算法[J];计算机应用研究;2017年02期
3 高晓波;方献梅;李石君;;基于因子分解机的信任感知商品推荐[J];山东大学学报(理学版);2016年01期
4 张玲;刘波;;基于残差统计的时间序列加性离群点检测算法研究[J];电子技术应用;2015年09期
5 胡亚慧;李石君;余伟;杨莎;方其庆;;一种结合文化和因子分解机的快速评分预测方法[J];南京大学学报(自然科学);2015年04期
6 张勇;赵晓辉;;人工神经网络结合灰度关联分析用于吉林省地方税收收入预测研究[J];电脑知识与技术;2015年20期
7 肖苏;熊焱;;基于灰度统计和神经网络的物流业税收预测模型[J];物流技术;2013年23期
8 刘斌;;浅析营业税改征增值税的影响[J];哈尔滨师范大学社会科学学报;2012年05期
9 谢小璐;;基于小波神经网络的Shibor预测研究[J];金融理论与实践;2012年08期
10 车金星;王广富;;基于粒子群最优化下BP神经网络的短期电价预测(英文)[J];南昌工程学院学报;2012年01期
相关博士学位论文 前1条
1 李红权;资本市场的非线性动力学特征与风险管理研究[D];湖南大学;2005年
相关硕士学位论文 前10条
1 韩莉;基于LM-BP神经网络股票预测研究[D];东北农业大学;2016年
2 徐子伟;基于分步特征选择和组合分类器的电信客户流失预测模型[D];中国科学技术大学;2016年
3 张方;基于BP神经网络的税收预测研究[D];长安大学;2016年
4 林庆添;基于人工智能算法的上海银行间同业拆放利率预测[D];兰州大学;2016年
5 樊娇;省级电网公司融资预测分析研究[D];华北电力大学(北京);2016年
6 师小伟;基于人工智能优化的组合模型在银行间拆借利率预测中的应用研究[D];兰州大学;2014年
7 肖坚;基于随机森林的不平衡数据分类方法研究[D];哈尔滨工业大学;2013年
8 余秋宏;基于因子分解机的社交网络关系推荐研究[D];北京邮电大学;2013年
9 袁小星;基于支持向量机集成的高新技术企业财务预警研究[D];哈尔滨工业大学;2012年
10 彭晓;基于灰色Logistic回归的上市公司财务困境预测研究[D];重庆理工大学;2010年
,本文编号:1932472
本文链接:https://www.wllwen.com/jingjilunwen/guojimaoyilunwen/1932472.html