基于用户行为数据的P2P网贷违约预测
本文选题:违约预测 + P2P网贷 ; 参考:《上海师范大学》2017年硕士论文
【摘要】:随着互联网金融的发展、P2P平台数量的壮大以及网贷需求的迅速发展,基于网贷用户的信用风险评定以及违约预测变得尤为重要。网贷业务场景下,贷款额度通常较低,贷款量庞大,传统的人工审批已不能满足网贷业务场景的需求;并且,网贷客户群体大多属于无征信人群,仅凭基本信息对用户进行信用评定的方法亦难以有效界定用户违约风险。但实际上,网贷平台依托于互联网,其天然存在一定的数据优势,充分利用好网贷平台的现有数据并整合第三方数据,同时深入挖掘用户行为对其违约进行预判是未来发展的一大方向。本文基于含用户登录日志以及用户信息更新日志的贷款交易数据,将其划分为基本信息、第三方数据、地理信息、登录日志、信息更新日志等六个分模块进行深入地挖掘与分析,并引入机器学习中特征工程的研究模型,对数据进行拓展与提炼。经本文研究发现,贷款前频繁更新个人信息的用户较其他用户而言更倾向于贷款违约。之后,本文基于特征工程提炼的信息,采用包裹式选择与过滤式选择相结合的方法,对其进行进一步的筛选与精简,构造出对于用户违约最具预测能力的特征子集,并利用Xgboost算法框架进行模型训练,得到准确性及稳定性均达预期水平的违约预测模型。通过深入分析本文所涉贷款数据的业务场景,结合模型搭建的分析流程,本文对于用户行为日志数据在用户违约预测上的应用提出建议,认为该数据适合作为反欺诈的规则提炼样本,通过对数据的分析与建模,获取预警指标,并将其部署于风控模型主体的后端,用以对用户风险等级进行调整或对用户违约情况进行预警并引入人工干预。
[Abstract]:With the development of Internet finance and the rapid development of P2P platform and the rapid development of network loan demand, credit risk assessment and default prediction based on Internet loan users become more and more important. Under the network loan business scenario, the loan amount is usually low, the loan amount is huge, the traditional manual examination and approval can not meet the demand of the network loan business scenario; moreover, the network loan customer group mostly belongs to the non-credit group. It is difficult to define the default risk of users only by the method of credit evaluation based on basic information. But in fact, the net loan platform relies on the Internet, it has certain data superiority naturally, make full use of the existing data of the network loan platform and integrate the third party data, At the same time, it is a major direction of future development to excavate the user's behavior in advance. Based on the loan transaction data including user logon log and user information update log, this paper divides the loan transaction data into basic information, third party data, geographical information, logon log. Six sub-modules, such as information update log, are used to deeply mine and analyze the data, and the research model of feature engineering in machine learning is introduced to expand and refine the data. It is found that users who update personal information frequently before loans are more likely to default on loans than other users. Then, based on the information extracted by feature engineering, this paper uses the method of package selection and filter selection to further screen and simplify it, and constructs the feature subset which has the most ability to predict user default. The Xgboost algorithm framework is used to train the model to obtain a default prediction model with accuracy and stability up to the expected level. Through the in-depth analysis of the business scenario of the loan data involved in this paper, combined with the analysis flow built by the model, this paper puts forward some suggestions for the application of user behavior log data in the prediction of user default. It is considered that the data is suitable for extracting samples as anti-fraud rules. By analyzing and modeling the data, the early warning index can be obtained and deployed in the back-end of the wind control model body. It is used to adjust the user's risk level or to warn the user of default and to introduce human intervention.
【学位授予单位】:上海师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F724.6;F832.4
【参考文献】
相关期刊论文 前10条
1 李淑锦;吕靖强;;基于BP神经网络的P2P网贷借款者的信用风险评估[J];生产力研究;2016年04期
2 栾红波;文福安;;数据挖掘在大学英语成绩预测中的应用研究[J];软件;2016年03期
3 吕勇斌;姜艺伟;张小青;;我国P2P平台网络借贷逾期行为和羊群行为研究[J];统计与决策;2016年04期
4 李明初;;基于Probit的网络借贷成功影响因素分析——以拍拍贷为例[J];会计之友;2016年04期
5 孙丽;;我国P2P网络借贷的发展现状及趋势探析[J];中国商论;2016年01期
6 熊志斌;;信用评估中的特征选择方法研究[J];数量经济技术经济研究;2016年01期
7 毛雯青;谭中明;;网贷融资信用评级问题探讨——以拍拍贷、人人贷为例[J];金融经济;2015年22期
8 黄秋_g;史小康;;个人信用风险评分的指标选择研究[J];新疆财经大学学报;2015年03期
9 李先瑞;;大数据征信破解小微企业融资困境探讨——以拍拍贷为例[J];会计之友;2015年13期
10 申端明;乔德新;许琨;林霞;江日念;;梯度渐进回归树算法在电子商务品牌推荐中的应用[J];计算机系统应用;2015年06期
相关硕士学位论文 前7条
1 王梦佳;基于Logistic回归模型的P2P网贷平台借款人信用风险评估[D];北京外国语大学;2015年
2 程冠皓;基于数据的信用评级处理和分析系统的设计与实现[D];哈尔滨工业大学;2015年
3 孙万龙;基于GBDT的社区问题标签推荐技术研究[D];哈尔滨工业大学;2015年
4 刘畅;基于Logistic的P2P网络贷款信用风险测度研究[D];安徽财经大学;2015年
5 段昊;基于P2P网贷平台特点的信用体系实证研究[D];北京邮电大学;2015年
6 邹润;基于模型组合算法的用户个性化推荐研究[D];南京大学;2014年
7 袁羽;基于Logistic回归的P2P网络贷款信用风险度量[D];上海社会科学院;2014年
,本文编号:1974472
本文链接:https://www.wllwen.com/jingjilunwen/guojimaoyilunwen/1974472.html