当前位置:主页 > 经济论文 > 银行论文 >

谱聚类算法改进及其在个人信用评价中的应用

发布时间:2019-04-17 18:21
【摘要】:聚类分析一直以来都是机器学习与数据挖掘的一个重要研究热点,它能够帮助我们更加清晰地认识事物之间的联系.近期研究热度不断攀升的谱聚类算法是一种新型高效的聚类分析算法,跟以往的聚类分析算法相比较,谱聚类算法适用于任何形状数据集,能够解决的问题更加多样化,合理利用谱聚类算法从海量数据中提取知识,是未来的一个重要研究思路.尤其是随着近期人工智能的关注度上升,提高算法精度,降低时间复杂度刻不容缓.本篇论文是依据独立成分分析以及信息熵理论知识对谱聚类算法进行算法改进,充实了谱聚类算法的知识体系,为求解聚类分析中的各种问题贡献了另一个新构思.本篇论文大部分的研究内容围绕如下三个方面:第一,谱聚类的聚类结果随着相似度量函数中尺度参数的改变而改变.相似度量函数描述的是样本之间的相似度,选择不一样的相似度量函数甚至只是尺度参数的差异都会对最终所展现的聚类效果产生很大影响.从本文研究结果可知,良好的相似度量函数应该能够很好展现出数据的分布特征,所以本文引入信息熵理论,通过最小化信息熵来优化尺度参数.第二,谱聚类最终的聚类结果受到Laplacian矩阵特征向量的不同选择方式的影响.Laplacian矩阵的本质是谱图划分准则松弛后的结果,在数据中的表现可以理解为对数据集进行特征提取.Laplacian矩阵对于谱聚类而言具有重要意义,但是针对Laplacian矩阵所对应的特征向量的个数选取不同对于最终的分类结果存在一定影响.改用近年来在特征提取方面表现更佳的ICA算法替代拉普拉斯矩阵是本文的一个大胆尝试,通过理论分析以及实验验证可知,结合ICA算法和信息熵理论改进后的谱聚类算法能够获取更佳的分类结果.第三,将ICASC算法与个人信用评价体系相结合,并将其应用到消费金融领域.个人信用评价是消费金融行业对客户分类的一个重要参考指标,能够有效降低坏账率,减少不必要的资金损失.本文最后将谱聚类算法与消费金融风控手段结合,经过实证得到结论,谱聚类算法能够有效识别“坏”客户.本学位论文的创新点也主要体现在上述三个方面:实现相似度量函数中尺度参数s的优化选择,用独立成分分析替换拉普拉斯变换实现相似矩阵的特征提取,以及应用创新,将改进谱聚类算法应用到消费金融领域的风控体系.最后,提出展望.未来的研究工作重点可以放在谱聚类算法的稳键性和可解释性上,将理论与实际应用场景相结合。
[Abstract]:Cluster analysis has always been an important research hotspot in machine learning and data mining, which can help us to understand the relationship between things more clearly. Recently, spectral clustering algorithm with rising heat is a new and efficient clustering algorithm. Compared with previous clustering algorithms, spectral clustering algorithm is suitable for any shape data set and can solve more diversified problems. Reasonable use of spectral clustering algorithm to extract knowledge from massive data is an important research idea in the future. Especially with the increasing attention of artificial intelligence recently, it is urgent to improve the algorithm precision and reduce the time complexity. In this paper, the spectral clustering algorithm is improved based on independent component analysis and information entropy theory, which enriches the knowledge system of spectral clustering algorithm and provides another new idea for solving various problems in clustering analysis. Most of the research work in this paper focuses on the following three aspects: first, the clustering results of spectral clustering vary with the change of the mesoscale parameters of the similarity measure function. Similarity measure function describes the similarity between samples. Choosing different similarity measure function and even the difference of scale parameter will have a great influence on the clustering effect. From the results of this paper, it can be seen that a good similarity measure function should be able to show the distribution characteristics of the data very well, so this paper introduces the information entropy theory to optimize the scale parameters by minimizing the information entropy. Second, the final clustering results of spectral clustering are influenced by different selection modes of eigenvector of Laplacian matrix. The essence of Laplacian matrix is the result of relaxation of spectral partition criterion. The representation in the data can be understood as feature extraction from the data set. Laplacian matrix is of great significance to spectral clustering. However, the selection of the number of Eigenvectors corresponding to the Laplacian matrix has a certain impact on the final classification results. Replacing Laplacian matrix with ICA algorithm, which has better performance in feature extraction in recent years, is a bold attempt in this paper. Through theoretical analysis and experimental verification, we can see that: The improved spectral clustering algorithm combined with ICA algorithm and information entropy theory can obtain better classification results. Thirdly, the ICASC algorithm is combined with the personal credit evaluation system, and it is applied to the field of consumer finance. Personal credit evaluation is an important reference index for customer classification in consumer finance industry, which can effectively reduce the rate of bad debts and the unnecessary loss of funds. In the end of this paper, the spectral clustering algorithm is combined with the risk control method of consumer finance, and the empirical results show that the spectral clustering algorithm can effectively identify the "bad" customers. The innovations of this dissertation are mainly reflected in the above three aspects: optimizing the selection of the mesoscale parameter s of similarity measure function, replacing Laplace transform with independent component analysis to realize feature extraction of similar matrix, and innovation in application. The improved spectral clustering algorithm is applied to the risk control system in the field of consumer finance. Finally, the prospect is put forward. The emphasis of future research can be put on the stability and explicability of spectral clustering algorithm, which combines the theory with the practical application scenario.
【学位授予单位】:深圳大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F832.4

【相似文献】

相关博士学位论文 前1条

1 吕绍高;统计学习中回归与正则化谱聚类算法的研究[D];中国科学技术大学;2011年

相关硕士学位论文 前10条

1 李纯;快速谱聚类算法的研究与应用[D];哈尔滨工程大学;2012年

2 董彬;一种基于主动学习的半监督谱聚类算法研究[D];中国矿业大学;2015年

3 刘萍萍;基于特征间隙检测簇数的谱聚类算法研究[D];南京邮电大学;2015年

4 孙承祥;双馈型风电机组的风电场建模研究[D];华北电力大学;2015年

5 崔慧岭;一种面向大数据的文本聚类算法[D];湖北师范大学;2016年

6 徐大海;基于分布式的谱聚类算法在虚拟社区发现上的应用研究[D];暨南大学;2016年

7 王有华;基于归一化压缩距离的文本谱聚类算法研究[D];贵州大学;2016年

8 张涛;基于密度估计的谱聚类算法研究与应用[D];江南大学;2016年

9 包秀娟;聚类有效性指标结构分析及应用[D];天津大学;2014年

10 周燕琴;基于改进谱聚类算法在医学图像中的应用研究[D];广西师范学院;2016年



本文编号:2459678

资料下载
论文发表

本文链接:https://www.wllwen.com/jingjilunwen/huobiyinxinglunwen/2459678.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户620b4***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com