基于金融类客户画像的二分K均值算法分析研究与应用

发布时间：2018-06-08 00:12

本文选题：数据仓库 + 客户画像　；参考：《中国科学院大学(工程管理与信息技术学院)》2016年硕士论文

【摘要】：随着近几年互联网的迅猛发展,大量企业进入到电子商务领域,借助电商平台来进行产品的营销和推广。信息技术借助互联网快速发展,互联网金融模式逐渐兴起。大数据时代的到来对于给金融机构既是挑战,也是机遇。互联网金融不是简单字面上的通过互联网来提供金融服务,这只是表面上的形式而已,背后还需要大量数据的积累和强大的数据处理能力,也是互联网金融的两个关键基础因素：大数据和云计算。互联网金融依托于大数据和云计算为客户提供一系列的互联网金融服务。而本文研究的基于互联网金融属性的券商电商平台是结合了产品销售、咨询服务、投资顾问签约、证券交易以及依托于大数据和云计算的综合型平台。目前而言,还没有具体针对券商电商客户精确化分类的金融平台,客户画像还只是用于简单的描述用户信息,本文将根据用户的基础信息、资产信息、交易记录、平台活动轨迹等行为数据通过云计算来进行数据建模,在客户画像的基础上对用户进行聚类分析建立数据分类模型,将客户进行分层,然后针对各层次的客户进行制定个性化营销方案,从而更有针对性的进行产品的营销和推广。客户分层分类通常使用聚类算法来实现,而K-means算法是最为常用的数据挖掘算法之一,通过对K-means算法的深入分析,作者发现选择适当的初始质心是K-means算法执行过程的关键,一般情况下会采用随机选取质心来解决人为干预的因素,但是这样会导致不同的运行产生不同的总误差平方和(Sum of the Squared Error,简称SSE),最终影响结果的准确性和稳定性。为了克服随机选取质心的缺陷,美国学者Pang-Ning Tan提出了二分K-means算法,这种算法的基本思想是将所有点的集合分裂成两个簇,从这两个簇中根据条件筛出选取一个继续分裂,如此下去产生K个簇。根据实际实验结果得出结论二分K-means算法受质心影响较小,且效率和准确性比K-means算法要高很多。本文则主要根据二分K-means算法进行分析研究和应用,通过此算法将券商客户分类以后,通过不同层次的客户匹配不同风险等级的产品,从而在策略上达到区分客户精准营销的目的。本文完成的主要工作包括：(1)建立统一的数据中心,将客户的各项数据进行统一抽取、分类,并通过系列方法来筛选整合数据,使客户数据达到实验要求；(2)建立客户画像系统,建立统一的客户画像指标体系,通过系列指标来筛选客户作为客户聚类分析的基础；(3)通过优化的聚类分析方法对客户数据进行分类,将客户分层,制定个性化营销方案,提高客户转化率。基于对于目前互联网金融电商平台对客户研究重要性的认知,本研究在系统综述经典文献研究的基础上,通过云计算平台将客户的大数据信息通过数据建模,在客户画像的基础上将用户进行分类算法分类,精确定位用户,并通过实际的个性化营销和推广来验证和修正数据模型,提高券商客户转化率,并达到了预期的效果。
[Abstract]:With the rapid development of the Internet in recent years, a large number of enterprises have entered the field of electronic commerce, with the help of the e-commerce platform to carry out the marketing and promotion of products. Information technology has developed rapidly with the help of the Internet, and the Internet financial model is rising gradually. The advent of the era of big data is not only a challenge but also an opportunity for the financial machinery. Simply literally, providing financial services through the Internet, which is just a surface form, requires a lot of data accumulation and powerful data processing capabilities. It is also the two key basic factor for Internet Finance: large data and cloud computing. Internet Finance provides a series of customers with large data and cloud computing. In this paper, the e-commerce platform based on the Internet financial attributes is a combination of product sales, consulting services, investment consulting, securities trading and integrated platform based on large data and cloud computing. At present, there are no specific financial platforms for the precise classification of securities business customers. Customer portrait is also used to simply describe user information. This article will model the data according to the user's basic information, asset information, transaction record, platform activity track and other behavioral data through cloud computing. Personalized marketing programs are made to customers at all levels, which are more targeted to the marketing and promotion of products. Customer stratification classification is usually implemented using clustering algorithms. The K-means algorithm is one of the most commonly used data mining algorithms. By deep analysis of the K-means algorithm, the author finds that the appropriate initial centroid is selected. It is the key to the execution of the K-means algorithm. In general, a random selection of centroids will be used to solve the factors of human intervention. However, this will lead to different running of the total error square sum (Sum of the Squared Error, for short, SSE), and ultimately affect the accuracy and stability of the result. In order to overcome the defect of random selection of the centroid The American scholar Pang-Ning Tan proposed a two point K-means algorithm. The basic idea of this algorithm is to divide the set of all points into two clusters, and select one to continue splitting from the two clusters according to the conditions, and then produce K clusters. According to the actual experimental results, the conclusion is that the effect of the centroid is smaller and the efficiency is less effective. And the accuracy is much higher than the K-means algorithm. This paper is mainly based on the analysis and application of the two point K-means algorithm. After classifying the broker customers, this algorithm can match the products of different risk levels through different levels of customers, so as to achieve the purpose of distinguishing the customers' accurate marketing in the strategy. The main work done in this paper is the main work of this paper. Including: (1) to establish a unified data center to unify and classify the customer's data, and to select the integrated data through a series of methods to make the customer data meet the requirements of the experiment; (2) establish a customer portrait system, establish a unified customer portrait index system, and screen customers as customer clustering analysis through a series of indicators. 3. (3) classifying customer data by optimizing clustering analysis method, delamination of customers, formulate personalized marketing schemes and improve customer conversion rate. Based on the understanding of the importance of Internet financial e-commerce platform to customer research, this research is based on the system overview of classic literature and through the cloud computing platform The customer's large data information is modeled by the data, the classification algorithm is classified on the basis of customer portrait, the user is accurately positioned, and the data model is verified and modified through the actual personalized marketing and promotion to improve the conversion rate of the customers and achieve the expected effect.
【学位授予单位】：中国科学院大学(工程管理与信息技术学院)
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP311.13

【参考文献】