正则化方法下生存模型的个人信用风险分析
本文关键词:正则化方法下生存模型的个人信用风险分析 出处:《上海师范大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 信用风险 生存分析 变量正则化 Logistic回归 决策树
【摘要】:信用风险是银行业的一个关键领域,是机构、消费者和监管机构等各种利益相关者共同关注的问题。信用风险的研究是金融领域的热点研究主题,近些年也引起了统计研究者的关注。Wikipedia(2017)将信用风险定义为:由于债务人不支付贷款而造成的损失风险或其他信贷额度。信用风险的核心是违约事件,当债务人不能根据债务合同偿付相关债务、履行法定义务,就发生了违约事件。在银行客户信用风险研究中,仅通过客户是否违约来评价其信用好坏是不够准确的。因为大部分客户在研究期内不会发生违约行为,我们无法观测到大部分个体的生存时间,这就产生了生存分析中常见的右删失数据。在最近这些年,一些研究将生存分析的方法运用到信用风险分析模型中。生存分析是一种动态分析方法,它不仅能预测事件发生的概率,也能预测事件发生的时间。它擅长处理删失数据和截尾数据,利用估计的生存概率可以更加直观地反应风险与特征因素之间的关系。同时在模型中引入时间变量,能更好的体现对象的生存状态。本文基于三年(36期)研究期内60508个样本银行客户420个高维特征变量的小额贷款脱敏数据,在传统的变量选择方法受到挑战的情况下,首先对当今热点的正则化方法进行查阅比较和算法尝试。接着,我们创新性的将违约的跨度时间考虑到信用分析模型中,引入客户首次违约的期数,将数据处理为生存数据的固定格式,并分别建立基于LASSO-MCP正则化方法的Cox乘法危险率模型和基于LASSO-SCAD正则化方法的加法危险率模型。同时,我们将重要变量的系数估计值与对应特征变量取值的乘积作为信用得分,建立分类规则,综合评价每一个客户的信用风险。通过与银行业务经验结果的反馈对比,给出基于生存模型的部分重要特征变量的经济意义。最后,我们从重要特征变量的结果和模型的预测效果两个方面对生存分析的两个模型进行比较。发现基于LASSO-MCP正则化方法的比例风险模型用更少的特征变量却得到了相对更好的分类效果。本文在最后从多个角度对基于不同方法的信用风险分析模型进行效果验证和比较。首先,基于实证数据分别实现传统二分类Logistic回归模型和现代决策树模型。接着,将前述章节中生存分析的乘法模型和加法模型与二者比较。基于理论分析和模型结果,从解释模型准确性的ROC曲线和代表模型区分能力的KS统计量两个方面比较四个模型,发现生存分析Cox模型均优于其他三种模型。这就从多方面验证了本文引入生存时间并基于正则化方法建立的生存分析模型的良好实证效果。从模型整体的准确性和区分力两个方面,综合得出:对于三年期小额贷款数据,基与LASSO-MCP正则化方法的生存分析Cox比例风险模型有最高的准确性和最大的模型区分力。
[Abstract]:Credit risk is a key area of banking, is a common concern of various stakeholders, such as institutions, consumers and regulators. The research of credit risk is a hot research topic in the field of finance. In recent years it has also attracted the attention of statisticians. Wikipedia2017). The credit risk is defined as the loss risk or other credit line caused by the debtor's failure to pay the loan. The core of the credit risk is the default event. When the debtor can not pay the related debt according to the debt contract, and fulfill the legal obligations, there is a default event. In the study of the credit risk of bank customers. It is not accurate to judge the credit quality of customers simply by whether they default or not, because most customers do not default during the study period, and we can not observe the survival time of most individuals. In recent years, some studies have applied the method of survival analysis to credit risk analysis model. Survival analysis is a dynamic analysis method. It not only can predict the probability of the event, but also can predict the time of the event. It is good at dealing with censored data and censored data. The estimated survival probability can reflect the relationship between risk and characteristic factors more intuitively. At the same time, time variables are introduced into the model. This paper based on 60508 sample bank customers during the research period 420 high-dimensional characteristic variables of micro-credit desensitization data. When the traditional method of variable selection is challenged, the regularization methods of today's hot spots are first compared and the algorithms are tried. We creatively take the span of default into account of the credit analysis model, introduce the number of customer first default period, and process the data into a fixed format of survival data. Cox multiplicative hazard rate model based on LASSO-MCP regularization method and additive hazard rate model based on LASSO-SCAD regularization method are established respectively. We take the product of coefficient estimate of important variable and the value of corresponding characteristic variable as credit score and establish classification rules. Comprehensive evaluation of the credit risk of each customer. By comparing with the results of bank experience, the economic significance of some important characteristic variables based on survival model is given. Finally. We compare the two models of survival analysis in terms of the results of important feature variables and the prediction effect of the model. It is found that the proportional risk model based on LASSO-MCP regularization method uses fewer features. In the end, this paper validates and compares the credit risk analysis model based on different methods from several angles. Based on the empirical data, the traditional two-classification Logistic regression model and the modern decision tree model are implemented respectively. The multiplication model and addition model of survival analysis in the previous chapters are compared with the two models, based on theoretical analysis and model results. The four models are compared from two aspects: the ROC curve which explains the accuracy of the model and the KS statistics which represent the distinguishing ability of the model. It is found that the survival analysis Cox model is superior to the other three models, which verifies the good empirical effect of the survival analysis model introduced in this paper based on the regularization method. There are two aspects: accuracy and differentiability. It is concluded that for three-year microfinance data, the Cox proportional risk model has the highest accuracy and maximum distinguishing power between the base and the LASSO-MCP regularization method.
【学位授予单位】:上海师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F832.4
【参考文献】
相关期刊论文 前9条
1 叶永刚;吴良顺;;基于BP神经网络模型的创业板上市公司信用级别评估和信用风险度量[J];经济与社会发展;2016年03期
2 李从刚;童中文;曹筱珏;;基于BP神经网络的P2P网贷市场信用风险评估[J];管理现代化;2015年04期
3 内蒙古银行课题组;杨海平;陈晶晶;;基于logistic回归的小微企业信用风险预警[J];内蒙古金融研究;2014年08期
4 庞素琳;巩吉璋;;C5.0分类算法及在银行个人信用评级中的应用[J];系统工程理论与实践;2009年12期
5 易传和;彭江;;基于FAHP的个人信用评分模型[J];统计与决策;2009年15期
6 张成虎;李育林;吴鸣;;基于判别分析的个人信用评分模型研究与实证分析[J];大连理工大学学报(社会科学版);2009年01期
7 李晓卉;;决策树技术在客户信用分析中的应用[J];武汉科技大学学报(社会科学版);2008年02期
8 余文建;沈益昌;杜洋;;基于Logistic模型的个人信用评分体系研究[J];海南金融;2007年03期
9 陈忠阳;违约损失率(LGD)研究[J];国际金融研究;2004年05期
相关博士学位论文 前2条
1 付光辉;高维的强相关数据的模型选择[D];中南大学;2011年
2 钱俊;生存分析中删失数据比例对Cox回归模型影响的研究[D];南方医科大学;2009年
相关硕士学位论文 前6条
1 张丹婷;基于生存分析的信用风险量化研究[D];浙江大学;2015年
2 陈丽;上市公司信用风险评价的Fisher判别分析模型[D];重庆大学;2013年
3 张s,
本文编号:1425758
本文链接:https://www.wllwen.com/jingjilunwen/huobiyinxinglunwen/1425758.html