基于支持向量机的互联网金融个人信用评估方法研究
发布时间:2018-05-05 12:03
本文选题:支持向量机 + bagging ; 参考:《浙江财经大学》2017年硕士论文
【摘要】:中国经济的快速发展,提高了居民信用消费能力。互联网金融的快速发展,为居民信用消费提供了便利,个人住房按揭贷款、个人小额贷款、信用卡消费贷款等信贷产品如雨后春笋般涌现。随着中国经济进一步信用化,信用消费拉动经济增长的作用进一步凸显,居民信用消费意愿和能力正稳步上升。国内各互联网金融机构纷纷把个人消费贷款业务作为未来的发展战略之一。但是,国内的互联网金融机构对个人消费贷款的风险管理水平相对较低,管理手段和方法还比较落后。此外,互联网金融机构不存在有效的个人信用评估方法,这严重阻碍了个人信贷业务的发展。有效的信用评估模型不仅能增加互联网金融机构的利润,而且还能扩大互联网金融机构的信贷规模。因此,个人信用评估方法的研究意义重大。在互联网金融时代,信用数据获取的方式发生了改变,不仅可以从传统的金融机构获取信贷数据,还可以从电商平台获取电商数据以及从社交平台获取社交数据。伴随而来的是信用数据规模的大幅度增长,信用评级业务面临着巨大的机遇和挑战,如果缺乏大数据的处理能力,就无法充分挖掘潜藏在海量信用数据背后的价值。互联网金融机构已经使用定量模型来评价消费者个人的信用风险,它的研究重点之一是信用评价模型。支持向量机是数据驱动型模型,它在监督式学习过程中对数据处理,不需要对数据做特别的假设。当数据量丰富或容易获取时,支持向量机的优势更加明显,所以,它得到了学者的青睐。支持向量机的泛化能力相对其它的模型更好,本文提出了基于支持向量机的集成模型。基于大数据时代背景下对互联网金融个人信用数据进行评估,本文在数据分析与整合方面进行探索分析。本文在现有研究的基础上,提出了基于支持向量机的集成模型RSBC-SVM,它以支持向量机作为基学习器,结合了bagging和random subspace两种常见的集成策略以及相关性最小化集成选择方法。此外,它还使用了模式搜索算法进行参数优化。RSBC-SVM模型的构建分四个阶段。第一个阶段为数据分割,该阶段先把原始数据分成初始训练集、验证集和测试集等三部分。本文使用训练集的数据训练个体学习器,使用验证集的数据挑选个体学习器,使用测试集的数据对所构建的集成模型进行效果验证。初始训练集经过bagging和random subspace算法处理后又产生若干个新的训练子集。第二个阶段为个体学习器的训练,在每一个新的训练子集上构建相应的支持向量机模型,并采用模式搜索算法调参。从个体学习器的角度分析,运用模式搜索算法寻找参数,提高了个体学习器的泛化能力;从个体学习器之间的关系角度分析,模式搜索算法为每一个支持向量机模型匹配不同参数,增强了个体学习器的多样性。第三个阶段为个体学习器的选择,本文采用相关性最小化方法对集成模型进行修剪,减小集成规模有助于减小模型的存储开销和预测开销,而且增强了个体学习器间的差异性。第四个阶段为合成模型,此阶段为RSBC-SVM模型构建的最后一步,本阶段先用Sigmoid函数将支持向量机的决策值输出转换成概率输出,而后使用简单平均法对个体学习器进行组合。本文最后还尝试在互联网金融个人信用数据上对所构建的RSBC-SVM模型进行效果验证。在数据实验前需要对数据进行预处理,本文使用随机森林方法插补缺失值,箱线图法删除异常数据,使用对数变换和归一化方法对变量进行处理。最后,与其它五种模型进行了对比分析,研究表明本文所构造的模型性能最好,具有较强的现实意义。本文的理论创新点在于对支持向量机作了深入研究,提出了新的集成模型RSBC-SVM,丰富了支持向量机的理论研究。影响集成模型效果的因素之一是个体学习器间的差异性;个体学习器多样性强,集成模型的效果就越好。在增强个体学习器的多样性方面,以往学者的关注重点是数据扰动、特征扰动和参数扰动,他们忽视了在合成模型前对个体学习器的选择研究。在互联网金融的背景下,本文采用了相关性最小化集成模型选择方法对个体学习器进行选择,为集成模型的个体学习器的选择研究提供了有益的参考。以上的研究,不仅在丰富支持向量机的内容方面具有一定的理论意义,而且在推动我国信用体系建设,提高我国互联网金融机构消费信贷市场的风险管理水平,促进我国消费信贷市场的进一步发展方面具有一定的现实意义。
[Abstract]:The rapid development of China's economy has improved the capacity of residents' credit consumption. The rapid development of Internet finance has provided convenience for the residents' credit consumption, such as personal housing mortgage loans, personal small loans, credit card consumer loans and other credit products springing up. With the further credit of China's economy, credit consumption has stimulated the economy to increase. The long-term effect is further highlighted, and the willingness and ability of residents' credit consumption are rising steadily. The domestic Internet financial institutions have taken personal consumer loan business as one of the future development strategies. However, the risk management of personal consumer loans by internet financial institutions in China is relatively low, and the management means and methods are still relatively falling. In addition, there is no effective personal credit evaluation method for Internet financial institutions, which seriously hinders the development of personal credit business. An effective credit evaluation model can not only increase the profit of Internet financial institutions, but also expand the scale of credit of Internet financial institutions. Therefore, the research significance of personal credit evaluation method is heavy. In the era of Internet finance, the way of obtaining credit data has changed, not only from the traditional financial institutions to obtain credit data, but also from the e-commerce platform to obtain e-commerce data and to obtain social data from the social platform. Opportunities and challenges, if the lack of large data processing capacity, can not fully excavate the value hidden behind the mass credit data. The Internet financial institutions have used quantitative models to evaluate the consumer's personal credit risk. One of the focus of its research is the model of credit evaluation. Support vector machine is a data driven model, and it is a data driven model. In the process of supervised learning, data processing does not require a special assumption of data. When the amount of data is rich or easy to obtain, the advantage of support vector machine is more obvious. Therefore, it gets the favor of the scholars. The generalization ability of support vector machine is better than other models. This paper proposes an integrated model based on support vector machine. On the basis of the existing research, this paper proposes an integrated model RSBC-SVM based on support vector machine (SVM), which is based on support vector machine (SVM) as a base learner, combined with two kinds of bagging and random subspace. The common integration strategy and the correlation minimization integration selection method. In addition, it uses the pattern search algorithm to build the parameter optimization.RSBC-SVM model in four stages. The first phase is data segmentation, which first divides the original data into the initial training set, the validation set and the test set, and other three parts. This paper uses the training set. The data training individual learner, using the data of the verification set to select the individual learner, uses the data of the test set to verify the effect of the integrated model. After the initial training set is processed by bagging and random subspace algorithm, a number of new training subsets are produced. The second stage is the training of individual learner, in every one. A new subset of training subsets is constructed, and the model search algorithm is used to adjust the parameter. From the point of view of individual learner, the model search algorithm is used to search for parameters and improve the generalization ability of individual learner. From the angle of relationship between individual learners, the pattern search algorithm is each support vector machine. The model matches the different parameters and enhances the diversity of individual learner. The third stage is the choice of individual learner. This paper uses the correlation minimization method to trim the integrated model, reducing the size of the integration helps to reduce the storage overhead and the prediction overhead, and increases the difference between the individual learners. Fourth orders are enhanced. The stage is the synthetic model. This stage is the last step of building the RSBC-SVM model. In this stage, the Sigmoid function is used to convert the decision value output of support vector machine into probability output, and then the individual learner is combined with a simple mean method. Finally, this paper also tries to build the RSBC-SVM module on the Internet gold to integrate the personal credit data. It needs to preprocess the data before the data experiment. This paper uses the random forest method to interpolate the missing value, the box line graph method deletes the abnormal data, uses the logarithmic transformation and normalization method to deal with the variables. Finally, the comparison analysis is carried out with the other five models, and the research shows the model performance constructed in this paper. The theoretical innovation of this paper is to make a thorough study of the support vector machine, and put forward a new integrated model RSBC-SVM, which enriches the theoretical research of support vector machines. One of the factors that affect the effect of the integrated model is the difference between individual learners, the diversity of individual learners and the effectiveness of the integrated model. As for the diversity of the individual learner, the focus of previous scholars' attention is on data disturbance, characteristic disturbance and parameter disturbance. They ignore the selection of individual learners before the synthetic model. Under the background of Internet finance, this paper adopts the correlation minimization integration model selection method to individual learning. The selection of the device provides a useful reference for the selection of the individual learner of the integrated model. The above study not only has a certain theoretical significance in enriching the content of support vector machines, but also promotes the construction of the credit system in China, and improves the risk management level of the consumer credit market of the Internet financial institutions in China, and promotes the promotion of the risk management level of the consumer credit market of the Internet financial institutions in China. The further development of China's consumer credit market has certain practical significance.
【学位授予单位】:浙江财经大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F724.6;F832.4
【参考文献】
相关期刊论文 前10条
1 黄巍;张靓;唐友;;基于SVM算法的个人信用评估方法的完善[J];黑龙江八一农垦大学学报;2016年02期
2 曹杰;邵笑笑;;基于信息增益和Bagging集成学习算法的个人信用评估模型研究[J];数学的实践与认识;2016年08期
3 李淑锦;吕靖强;;基于BP神经网络的P2P网贷借款者的信用风险评估[J];生产力研究;2016年04期
4 石澄贤;陈雪交;;P2P网贷个人信用评价指标体系的构建[J];常州大学学报(社会科学版);2016年01期
5 杨雪雁;;商业银行不良贷款问题研究[J];时代金融;2015年23期
6 ;P2P发展呈现新趋势 坏账率上升引发关注[J];北方金融;2015年05期
7 李扬;李竟翔;王园萍;;基于AUC回归的不平衡数据特征选择模型研究[J];统计与信息论坛;2015年05期
8 朱海;张红梅;徐超;;基于相对熵的存货质押融资模式下中小企业信用评价[J];贵州工程应用技术学院学报;2015年02期
9 孟杰;李春林;;基于随机森林模型的分类数据缺失值插补[J];统计与信息论坛;2014年09期
10 张目;黄春燕;李岩;;基于相对熵和可变模糊集理论的战略性新兴产业企业信用评价[J];数学的实践与认识;2014年13期
,本文编号:1847618
本文链接:https://www.wllwen.com/jingjilunwen/guojimaoyilunwen/1847618.html