基于MapReduce的广告点击率预测系统设计与实现
发布时间:2018-06-29 00:02
本文选题:计算广告 + 贝叶斯网 ; 参考:《云南大学》2016年硕士论文
【摘要】:随着信息技术的快速发展,“大数据”时代已经到来,依托大数据和互联网,许多传统被更新甚至是被颠覆。搜索广告已经成为互联网行业的主要收入来源之一,广告通常的运营模式为关键词竞拍,广告主支付获得这些关键词的费用,其主要付费方式为按点击付费(Pay Per Click, PPC)。广告主的付费为每次点击费用(Cost Per Click, CPC);广告的热门程度用点击率(Click-Through-Rate, CTR)描述;而广告媒介的收益则是CTR ×CPC,因此,广告点击率预测变得尤其重要。本文首先利用MapReduce框架对海量广告数据进行处理,接着基于贝叶斯网构造广告关键词之间的相似模型,在接下来对存储在HBase上的大规模贝叶斯网进行概率推理,进而得到待预测广告的点击率。基于上述思想实现了广告点击率预测系统,成功解决了海量数据情况下的广告点击率预测。本文的主要工作概括如下:1)数据预处理。本文主要是利用MapReduce框架对用户搜索日志进行分析处理,提取出有价值的数据后存储到HBase中。2)大规模贝叶斯网的构建和存储。本文利用MapReduce分布式计算框架,将广告关键词作为贝叶斯网的节点,首先构造贝叶斯网的有向无环图结构,接着基于贝叶斯网的有向无环图结构,并行计算各节点的条件概率参数表,最后将完整构造的贝叶斯网以键值对key,value形式并行地存储到HBase表中。3)基于大规模贝叶斯网的广告点击率预测。本文将贝叶斯网的概率推理转化为HBase上的数据查询处理,同时基于MapReduce编程模型实现大规模贝叶斯网的概率推理,进而预测广告点击率。4)基于上述研究内容,我们将设计相应的系统,包含如下三个模块:数据预处理模块、大规模贝叶斯网构建模块以及广告点击率预测模块,进而实现了基于MapReduce的广告点击率预测系统。5)最后本文基于真实商用数据,对系统进行了功能测试和非功能测试。
[Abstract]:With the rapid development of information technology, "big data" era has come, relying on big data and the Internet, many traditions have been updated or even subverted. Search advertising has become one of the main sources of revenue in the Internet industry. Advertising usually operates as a keyword auction, and advertisers pay the cost of obtaining these keywords. The main way of payment is pay per click (PPC). Advertisers pay for the cost per click (CPC); the popularity of advertising is described by Click-Through-Rate- (CTR); and the revenue of advertising media is CTR 脳 CPC.Therefore, the prediction of ad click rate becomes particularly important. In this paper, we first use MapReduce framework to process massive advertising data, then construct a similar model of advertising keywords based on Bayesian network, and then do probability reasoning for large-scale Bayesian network stored on HBASE. And then get the click rate of the ad to be predicted. Based on the above ideas, an ad click rate prediction system is implemented, which successfully solves the problem of advertising click rate prediction in the case of massive data. The main work of this paper is summarized as follows: 1) data preprocessing. In this paper, we use MapReduce framework to analyze and process user search logs, extract valuable data and store them into HBASE. 2) Construction and storage of large-scale Bayesian networks. In this paper, we use MapReduce distributed computing framework to construct directed acyclic graph structure of Bayesian network, and then construct directed acyclic graph structure based on Bayesian network. The conditional probabilistic parameter tables of each node are computed in parallel. Finally, the constructed Bayesian networks are stored in parallel in the form of key-value pairs in HBase tables. 3) based on the large scale Bayesian networks, the advertisement click rate is predicted. In this paper, the probabilistic reasoning of Bayesian network is transformed into data query processing on HBase, and the probability reasoning of large-scale Bayesian network is realized based on MapReduce programming model. We will design the corresponding system, including the following three modules: data preprocessing module, large-scale Bayesian network construction module and advertising click rate prediction module. Finally, based on the real commercial data, the function test and non-function test of the system are carried out.
【学位授予单位】:云南大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.52
,
本文编号:2079934
本文链接:https://www.wllwen.com/wenyilunwen/guanggaoshejilunwen/2079934.html