基于多核集群的电子商务应用并行化研究
发布时间:2018-04-14 23:10
本文选题:并行计算 + 云计算 ; 参考:《辽宁师范大学》2013年硕士论文
【摘要】:并行计算的出现为单机时代计算瓶颈带来了划时代的解决方案,同时带动了并行集群的发展。如今随着计算机集群的逐步发展,并行平台的种类也逐渐增多,各个平台有特有的并行优势。云计算是一种基于因特网的超级计算模式,它将计算任务分割分布在大量计算机构成的集群上,获取超强计算能力、存储空间和信息服务,是目前最为流行的计算模式。 近年来,电子商务市场从卖家市场转向买家市场,促使电商之间激烈的竞争。客户关系维系是企业发展的立足点,是企业盈利的前提。企业想要对客户进行准确的分析就要对客户进行分类,,传统的分类方法是基于经验归类或简单统计的方法,然而在面对海量数据时,单机计算能力举步维艰。本文针对此,将并行计算的思想引入到电子商务客户分类研究领域来解决上述问题。设计了多数据表关联算法对数据进行预处理,把从电子商务网站上获得的商品信息和历史交易记录中的数据关联在一起,转换成适合数据挖掘的形式。同时设计了客户分类方法,选取FCM模糊聚类算法对预处理后的客户数据分析。 数据表关联的传统方式是使用本地并行数据库,但面临多个因特网下的海量数据表关联时力不从心。云计算模式的Hadoop集群能解决因特网下的海量数据表关联问题,Hadoop集群的高效性适用于大型数据密集型任务的计算,应用于诸多领域。本文基于Hadoop集群实现了多个海量数据表的关联,并对实验数据进行详尽对比,对比结果表明Hadoop集群的高性能性在处理海量数据表连接方面有明显的并行效率。 数据预处理之后的数据分析中,选取了多元统计分析中应用广泛的FCM模糊聚类分析算法对客户数据分类。与传统的基于经验或简单统计的分类方法相比,拓宽了指标体系,由单一指标拓宽到客户消费模式多个指标。以凡客诚品的交易数据为例进行试验,按照客户消费模式将客户分为四类:优质客户、一般客户、小客户及潜在客户。实验结果验证了FCM算法对数据聚类效果及MATLAB多核并行集群在并行处理复杂算法的高效性。 本文设计的方法可以运用到金融领域的大规模数据处理及客户分类分析中,具有一定的应用价值。
[Abstract]:The emergence of parallel computing has brought about epoch-making solutions for the bottleneck of computing in the single machine era, and has also driven the development of parallel clusters.Nowadays, with the development of computer cluster, the variety of parallel platforms is increasing, and each platform has its own parallel advantages.Cloud computing is a kind of supercomputing mode based on Internet. It divides computing tasks into a large number of computer clusters to obtain super computing power, storage space and information services, which is the most popular computing mode at present.In recent years, e-commerce market from seller market to buyer market, promote fierce competition between e-commerce.Customer relationship maintenance is the foothold of enterprise development and the premise of enterprise profit.The traditional classification method is based on experience or simple statistics, but in the face of massive data, the single machine computing ability is difficult.In this paper, the idea of parallel computing is introduced into the field of customer classification in e-commerce to solve the above problems.A multi-data table association algorithm is designed to preprocess the data and correlate the commodity information obtained from the e-commerce website with the data from the historical transaction record and convert it into a form suitable for data mining.At the same time, the customer classification method is designed, and the FCM fuzzy clustering algorithm is selected to analyze the customer data after preprocessing.The traditional way of data table association is to use local parallel database, but it is difficult to associate large amount of data table under multiple Internet.The Hadoop cluster based on cloud computing model can solve the problem of massive data table association under the Internet. The high efficiency of Hadoop cluster is suitable for the computation of large data intensive tasks and applied in many fields.Based on the Hadoop cluster, this paper realizes the association of many massive data tables, and compares the experimental data in detail. The comparison results show that the high performance of Hadoop cluster has obvious parallel efficiency in dealing with the connection of massive data tables.In the data analysis after data preprocessing, FCM fuzzy cluster analysis algorithm, which is widely used in multivariate statistical analysis, is selected to classify customer data.Compared with the traditional classification method based on experience or simple statistics, the index system is broadened from single index to multiple indexes of customer consumption pattern.Taking VANCL's transaction data as an example, the customers are divided into four categories according to customer consumption pattern: high quality customers, general customers, small customers and potential customers.The experimental results verify the effectiveness of FCM algorithm in data clustering and the efficiency of MATLAB multi-core parallel cluster in parallel processing complex algorithm.The method designed in this paper can be applied to large-scale data processing and customer classification analysis in the field of finance.
【学位授予单位】:辽宁师范大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.13
【参考文献】
相关期刊论文 前10条
1 王友明;多元统计分析方法及其在经济研究中的应用[J];安徽水利水电职业技术学院学报;2003年02期
2 尹世久;吴林海;刘梅;;消费者网络购物影响因素分析[J];商业研究;2009年08期
3 杨雷;胡炜薇;杨莘元;卓志敏;;多目标聚类融合跟踪中的特征信息利用[J];弹箭与制导学报;2007年02期
4 林大云;;基于Hadoop的微博信息挖掘[J];计算机光盘软件与应用;2012年01期
5 司锡才;陈玉坤;李志刚;;数据关联算法的研究[J];哈尔滨工程大学学报;2007年07期
6 刘江;赵卫国;李小龙;周艳;;多元统计分析在产品设计要素分析中的应用[J];机电产品开发与创新;2007年05期
7 向小军;高阳;商琳;杨育彬;;基于Hadoop平台的海量文本分类的并行化[J];计算机科学;2011年10期
8 程苗;陈华平;;基于Hadoop的Web日志挖掘[J];计算机工程;2011年11期
9 张军伟;王念滨;黄少滨;
本文编号:1751431
本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/1751431.html