电商金融大数据价值提取与空间关联挖掘应用研究
本文选题:电商金融 + 大数据征信 ; 参考:《江西理工大学》2017年硕士论文
【摘要】:随着搜索引擎、云计算、人工智能这些新兴技术的成熟和普及,人类在日常中产生的数据量出现了前所未有的爆发式增长,催生了“大数据”时代的到来。在这种背景下,互联网与传统金融业的“碰撞”使得互联网金融应运而生。互联网金融的诞生满足了中小微企业和大众金融消费者的需求,弥补了传统金融机构的不足,为普惠金融的发展提供新的思路。其中,以电子商务平台为核心的电商金融在所有互联网金融模式中影响最大,引起了整个行业和社会的高度关注。电商金融行业本身就是一个基于数据的产业,行业内拥有着大量的多源异构数据,一方面是自身内部电商平台的海量历史交易数据;另一方面是互联网和社交媒体上的外部数据。因此,如何具备从电商金融大数据中提取和挖掘所蕴含数据价值的能力将决定未来整个电商金融行业的竞争力。本文针对上述问题,在分析电商金融大数据特征及价值、国内外基于空间关联规则的挖掘方法以及大数据挖掘研究现状的基础上,采用分布式搜索引擎技术,定制网络爬虫从电商金融行业的多源异构数据中获取所需要的银行卡和淘宝店铺数据,设计相应的Spark并行算法对数据预处理,建立倒排表和二级索引文件,为后面的大数据分析平台提供数据源。确定数据来源后,运用MECE分析法并结合行业内多位金融业务专家评分得到企业信用风险评价候选指标集及量化方法,分析指标相关性和风险定级。接着,利用大数据机器学习库中的随机森林算法对候选指标集特征选择,设计基于Hash结构的多级空间关联规则算法来挖掘企业风险信息,构建出信用风险评估与智能预警模型。最后,将机器学习、挖掘算法库、信用风险评估与智能预警模型、大数据存储与分布式计算能力进行封装,搭建基于Spark on YARN的电商金融大数据分析平台,对所研究模型的准确度和平台实用性进行验证。以淘宝平台某旗舰店一年的日常经营数据、银行卡资金往来数据和管理层群体数据作为数据源,利用电商金融大数据分析平台对店铺进行经营行为分析,提供信用风险评估与审批授信和贷后风险预警管理服务,证明构建的信用风险评估与智能预警模型能够达到预期要求,具有较高的可信度。
[Abstract]:With the maturity and popularization of search engine, cloud computing and artificial intelligence, the amount of data generated by human beings in the daily life has increased dramatically, and the era of "big data" has come into being. Under this background, the collision between Internet and traditional financial industry makes Internet finance emerge as the times require. The birth of Internet finance meets the needs of small and medium-sized enterprises and consumers of popular finance, makes up for the shortcomings of traditional financial institutions, and provides a new way of thinking for the development of inclusive finance. Among them, the electronic commerce finance with the electronic commerce platform as the core has the biggest influence in all the Internet finance models, which has aroused the high attention of the whole industry and the society. E-commerce finance industry itself is a data-based industry, the industry has a large number of multi-source heterogeneous data, on the one hand, the internal e-commerce platform of the massive historical transaction data; On the other hand are external data on the Internet and social media. Therefore, how to extract and mine the data value from the e-commerce finance big data will determine the competitiveness of the entire e-commerce finance industry in the future. In this paper, based on the analysis of the characteristics and value of big data in e-commerce finance, the mining methods based on spatial association rules and the current situation of big data mining, the distributed search engine technology is adopted in this paper. The customized web crawler acquires the bank card and Taobao store data from the multi-source heterogeneous data of the e-commerce finance industry, designs the corresponding Spark parallel algorithm to preprocess the data, and establishes the inverted list and the secondary index file. Provide data sources for later big data analysis platforms. After the data source is determined, the enterprise credit risk evaluation candidate index set and quantitative method are obtained by using MECE analysis method and combining with the score of many financial business experts in the industry, and the correlation and risk grading of the index are analyzed. Then, using the stochastic forest algorithm in big data machine learning library to select the feature of candidate index set, a multi-level spatial association rule algorithm based on Hash structure is designed to mine enterprise risk information, and a credit risk assessment and intelligent early warning model is constructed. Finally, the machine learning, mining algorithm library, credit risk assessment and intelligent early warning model, big data storage and distributed computing ability are encapsulated, and the big data analysis platform of e-commerce finance based on Spark on YARN is built. The accuracy and practicability of the model are verified. Taking the daily management data of a flagship store on Taobao platform, bank card fund data and management group data as the data source, the big data analysis platform of e-commerce finance is used to analyze the business behavior of the store. It is proved that the established credit risk assessment and intelligent early-warning model can meet the expected requirements and have a high credibility by providing the services of credit risk assessment and approval and post-loan risk early warning management.
【学位授予单位】:江西理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13
【参考文献】
相关期刊论文 前10条
1 周成虎;;大数据时代的空间数据价值——《空间数据挖掘理论与应用》评介[J];地理学报;2016年07期
2 王伟;陈伟;祝效国;王洪伟;;众筹融资成功率与语言风格的说服性——基于Kickstarter的实证研究[J];管理世界;2016年05期
3 朱宇峰;兰小机;康俊锋;;旅游地理信息垂直搜索引擎及应用研究[J];测绘科学;2016年05期
4 侯敬文;程功勋;;大数据时代我国金融数据的服务创新[J];财经科学;2015年10期
5 刘国平;;电商金融的运作模式与商业银行的应对策略[J];新金融;2015年08期
6 武若楠;;基于以电商平台为核心的互联网金融研究[J];时代金融;2015年11期
7 李清泉;李德仁;;大数据GIS[J];武汉大学学报(信息科学版);2014年06期
8 郭迟;刘经南;方媛;罗梦;崔竞松;;位置大数据的价值提取与协同挖掘方法[J];软件学报;2014年04期
9 于艳华;宋美娜;;大数据[J];中兴通讯技术;2013年01期
10 路永和;李焰锋;;改进TF-IDF算法的文本特征项权值计算方法[J];图书情报工作;2013年03期
相关会议论文 前1条
1 吕琳;朱东华;刘玉琴;;面向数据仓库的数据预处理研究综述[A];2007年中国智能自动化会议论文集[C];2007年
相关硕士学位论文 前5条
1 张日金;我国P2P网络借贷风险控制研究[D];浙江大学;2015年
2 樊嘉麒;基于大数据的数据挖掘引擎[D];北京邮电大学;2015年
3 谈浩;互联网金融和小额贷款研究[D];上海交通大学;2013年
4 杨宸铸;基于HADOOP的数据挖掘研究[D];重庆大学;2010年
5 方刚;空间关联规则挖掘算法的研究与应用[D];电子科技大学;2009年
,本文编号:1959906
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1959906.html