基于决策树算法的P2P网贷借款人违约风险度量研究
发布时间:2018-08-08 12:51
【摘要】:自2007年引入中国,互联网金融主要模式之一的P2P网络借贷,凭借着其低门槛、操作简便等诸多优势得到了爆发式发展。据统计,截至2015年12月31日,我国已有2959家正常运营的网贷平台,且仍在迅猛发展中。在准入门槛低、行业标准缺失、监管不力等行业发展背景下,平台跑路、停业等问题数见不鲜。2015年12月底,累计有1263家问题网贷平台,仅12月新增问题平台环比增长27家。其中,跑路及停业平台数量占比87.74%,其平均经营时间较长(13.78个月),可见跑路平台中纯诈骗平台并不多,主要还是平台自身运营出现问题。P2P网络借贷业务面临的信用风险、技术风险、法律风险等众多风险类型中,最关键的是信用风险,即借款人违约风险。本文旨在研究在监管正规有力、法律健全无漏洞的市场环境下,网贷平台自身度量借款人违约风险的方法。限于现阶段我国P2P行业数据积累的不完善性和平台借款人详细数据交易的非公开性等多种因素,本文所选作为实证分析的数据来自目前美国最大的P2P网络借贷平台“Lending Club”网站,其网站上借款人信息数据披露充分且准确。本文站在P2P平台控制风险的角度,研究借款人这一角色的违约风险度量方法,利用Lending Club网站上公布的借款人详细交易数据,选取出若干备选风险特征与变量,通过数据抽取、数据转换、过采样、数据离散化等方法对数据进行预处理,并借助信息增益率技术筛选出最终建模特征变量,继而在Weka平台上构建C4.5决策树信用风险度量模型。通过混淆矩阵、ROC曲线、AUC值等评估指标得出构建的决策树风险度量模型具有较好的解释力的结论。在此基础上,本文还采用Bagging,Adaboost集成学习方法对C4.5基本决策树模型进行改进与提升,取得了很好的度量效果。虽然选取的是国外数据,但是其方法和结论仍具有一定参考意义。文章最后基于决策树违约风险评估研究结果对完善平台借款人征信体系提出了一些改进意见。
[Abstract]:Since it was introduced into China in 2007, P2P network lending, one of the main modes of Internet finance, has been explosively developed with its advantages of low threshold and easy operation. According to statistics, as of December 31, 2015, there are 2959 normal network loan platforms in China, and they are still developing rapidly. Under the background of low entry threshold, lack of industry standards, weak supervision and other industry development, the number of problems such as platform running and shutting down is not new. At the end of December 2015, there were 1263 problem net loan platforms, 27 new problem platforms were added in December only. Among them, the number of running and shutting down platforms accounts for 87.74 percent, and its average operating time is relatively long (13.78 months). It can be seen that there are not many pure fraud platforms in the running road platform, mainly because of the credit risks faced by the platform's own operation problems and P2P network lending business. Among the various risk types, such as technical risk and legal risk, credit risk is the most critical, that is, borrower default risk. The purpose of this paper is to study the method of measuring the borrower's default risk in the market environment where the formal supervision is effective and the law is sound and there are no loopholes. Limited to many factors, such as the imperfection of data accumulation in P2P industry in our country at present and the non-disclosure of detailed data transactions of platform borrowers, The data selected in this paper as an empirical analysis come from the Lending Club website, the largest P2P network lending platform in the United States at present. The information disclosure of borrowers on the website is full and accurate. From the point of view of controlling risk on P2P platform, this paper studies the measurement method of default risk in the role of borrower. By using the detailed transaction data of borrower published on Lending Club website, this paper selects a number of alternative risk characteristics and variables and extracts them through data extraction. Data conversion, oversampling and data discretization are used to preprocess the data, and the final modeling feature variables are screened by the information gain rate technique, and then the credit risk measurement model of C4.5 decision tree is constructed on Weka platform. Based on the ROC curve and AUC value of confusion matrix, it is concluded that the risk measurement model of decision tree has good explanatory power. On this basis, we also improve and improve the basic decision tree model of C4.5 by using the BaggingsAdaboost ensemble learning method, and obtain a good measurement effect. Although foreign data are selected, its methods and conclusions still have some reference significance. Finally, based on the research results of decision tree default risk assessment, some suggestions are put forward to improve the platform borrower credit system.
【学位授予单位】:湖南师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:F724.6;F832.4;TP18
本文编号:2171832
[Abstract]:Since it was introduced into China in 2007, P2P network lending, one of the main modes of Internet finance, has been explosively developed with its advantages of low threshold and easy operation. According to statistics, as of December 31, 2015, there are 2959 normal network loan platforms in China, and they are still developing rapidly. Under the background of low entry threshold, lack of industry standards, weak supervision and other industry development, the number of problems such as platform running and shutting down is not new. At the end of December 2015, there were 1263 problem net loan platforms, 27 new problem platforms were added in December only. Among them, the number of running and shutting down platforms accounts for 87.74 percent, and its average operating time is relatively long (13.78 months). It can be seen that there are not many pure fraud platforms in the running road platform, mainly because of the credit risks faced by the platform's own operation problems and P2P network lending business. Among the various risk types, such as technical risk and legal risk, credit risk is the most critical, that is, borrower default risk. The purpose of this paper is to study the method of measuring the borrower's default risk in the market environment where the formal supervision is effective and the law is sound and there are no loopholes. Limited to many factors, such as the imperfection of data accumulation in P2P industry in our country at present and the non-disclosure of detailed data transactions of platform borrowers, The data selected in this paper as an empirical analysis come from the Lending Club website, the largest P2P network lending platform in the United States at present. The information disclosure of borrowers on the website is full and accurate. From the point of view of controlling risk on P2P platform, this paper studies the measurement method of default risk in the role of borrower. By using the detailed transaction data of borrower published on Lending Club website, this paper selects a number of alternative risk characteristics and variables and extracts them through data extraction. Data conversion, oversampling and data discretization are used to preprocess the data, and the final modeling feature variables are screened by the information gain rate technique, and then the credit risk measurement model of C4.5 decision tree is constructed on Weka platform. Based on the ROC curve and AUC value of confusion matrix, it is concluded that the risk measurement model of decision tree has good explanatory power. On this basis, we also improve and improve the basic decision tree model of C4.5 by using the BaggingsAdaboost ensemble learning method, and obtain a good measurement effect. Although foreign data are selected, its methods and conclusions still have some reference significance. Finally, based on the research results of decision tree default risk assessment, some suggestions are put forward to improve the platform borrower credit system.
【学位授予单位】:湖南师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:F724.6;F832.4;TP18
【引证文献】
相关期刊论文 前1条
1 程园园;刘胜题;;网络借贷中借款方还款情况分析——基于多值选择模型[J];电子商务;2017年07期
,本文编号:2171832
本文链接:https://www.wllwen.com/jingjilunwen/guojimaoyilunwen/2171832.html