基于SAP HANA数据库的推荐方法研究

发布时间：2018-06-09 22:03

本文选题：SAP + HANA　；参考：《北京林业大学》2016年硕士论文

【摘要】：随着电子商务在互联网时代长达二十年的发展,电子商务的学术研究也一直在进步,针对消费者行为的研究也越来越多。快速处理大量数据和进行实时分析的能力,将决定公司能否快速响应市场变化,从而获得优势。在这样的背景下,提升分析速度显得更为急迫,SAP HANA(SAP High-Performance Analytic Appliance)由此而生,它具有实时分析、存储和处理大数据的能力,并充分发挥其商业数据的价值,帮助企业抓住机遇,进行实时决策。本研究以HANA数据库以及其上安装的相应组件为基础,利用大数据竞赛平台kaggle网站中,日本领导团购网站Ponpare在该网站提供的一年交易信息,进行预测分析研究。本论文进行的研究工作主要如下：1.完成本文中系统整体架构的设计,保证在HANA中实现整体功能的顺利运行。主要包括数据抽取层,数据仓库层,数据处理和分析层。本文中数据最开始储存在Oracle数据库中作为数据源,E1M(企业信息管理)作为抽数工具将数据抽取到HANA中,PAL和基于HANA的R语言作为算法实现工具完成数据的预处理和分析。数据在几个组件中可实现无障碍的流通,满足系统的连贯性。2.利用HANA PAL(预测分析库)与AFM结合的工具来实现数据融合、缺失值填补以及数值归一化的操作,从而得到可以用于研究的数据。在数据挖掘之前,针对客户的浏览购物信息和个人信息,以及优惠券的原始信息进行介绍分析,对网站提供的初始数据进行数据预处理,以提高数据挖掘效率,降低挖掘所需要的时间。3.在HANA数据库的环境中,采用基于HANA的R语言环境,完成推荐系统算法的实现。首先,利用cbind函数把向量和矩阵拼成一个新矩阵；其次,对属性赋予不同的权重;最后,计算用户属性与优惠券间的cosine相似度并进行排序,得到客户最有可能购买的10个优惠券ID。通过对比用户实际购买产品与推荐产品的类型及所在区域,得到推荐结果的正确率。本文将最近流行的数据挖掘与SAP近几年新推出的数据库HANA相结合。通过最新组件EIM、PAL完成数据的迁移、数据预处理以及数据预测分析。
[Abstract]:With the development of e-commerce in the Internet age for twenty years, the academic research of electronic commerce has also been progressing, and more and more research on consumer behavior. The ability to quickly deal with large amounts of data and carry out real-time analysis will determine whether the company can respond quickly to market changes and gain advantages. In this context, The speed of the rise analysis is more urgent, and SAP HANA (SAP High-Performance Analytic Appliance) is born. It has the ability to analyze, store and process large data in real time, and give full play to the value of its commercial data, help the enterprise to seize the opportunity to make real time decision. This research is based on the HANA database and the corresponding components installed on it. Based on the kaggle website of the big data competition platform, the Japanese leader group purchase website Ponpare provides the one year transaction information provided by the website for the prediction analysis. The main research work of this paper is as follows: 1. complete the design of the overall system architecture in this paper, and ensure the smooth operation of the whole function in the HANA. Data extraction layer, data warehouse layer, data processing and analysis layer. In this paper, data is first stored in Oracle database as data source. E1M (Enterprise Information Management) is used as a pumping tool to extract data into HANA. PAL and HANA based R language are used as algorithm implementation tools to complete data preprocessing and analysis. Data are in several components. .2. can achieve data fusion, missing value filling and numerical normalization, which can be used to achieve data fusion, missing value filling and numerical normalization, so as to get data that can be used for research. Before data mining, the customers' browsing and shopping information and personal information, as well as preferential treatment for customers, and preferential treatment. The original information of the voucher is introduced and analyzed. The initial data provided by the website is preprocessed to improve the efficiency of the data mining and reduce the time needed by the mining. In the environment of the HANA database, the HANA based R language environment is used to complete the implementation of the recommendation system algorithm. First, the vector and matrix of the cbind function are spelled together with the cbind function. A new matrix is given; secondly, the attributes are given different weights; finally, the cosine similarity between the user attributes and coupons is calculated and the 10 coupon ID. is most likely to be purchased by the customer to get the correct rate of the recommended results by comparing the types and areas where the user actually buys the product and the recommended product. Popular data mining is combined with the new SAP database HANA in recent years. Through the latest component EIM, PAL has completed the migration of data, data preprocessing and data prediction analysis.
【学位授予单位】：北京林业大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP311.13;TP391.3

【相似文献】