当前位置:主页 > 经济论文 > 国际贸易论文 >

基于递归神经网络的广告点击率预估

发布时间:2018-01-19 04:04

  本文关键词: 在线广告 广告点击率 逻辑回归 随机森林 递归神经网络 LSTM 出处:《浙江理工大学》2016年硕士论文 论文类型:学位论文


【摘要】:伴随着互联网而生的在线广告,在过去几十年里,得到飞速发展。广告点击率作为计算广告的重要研究内容,越来越受到人们的关注。借助机器学习,根据历史数据预估广告点击率是目前的主要方法,凭借准确的广告点击率预估可以使广告投放得更加精准,提高真实的点击率,增加收益。虽然使用线性模型可以简单地预估广告点击率,但是线性模型的学习能力有限,对于越来越多的数据特征,无法更有效得学习,而且在学习的过程中容易出现过度拟合的情况,影响模型对特征的学习。基于神经网络算法的模型采用非线性激励函数以及多层节点结构可以更好得学习大量非线性特征之间复杂的关系,从而提高模型的预估能力。其中,递归神经网络是一种网络中存在环结构、能存储神经元前一时刻的输出并且具有较强的优化计算能力的神经网络。本文主要工作包括以下三个方面:(1)本文针对不同的模型进行相应的特征处理,逻辑回归模型采用拼接显性特征组合提取隐藏用户属性,再通过哈希映射,将原来不同类型的特征值转换成相同类型的特征值。随机森林模型采用建立特征字典,过滤频次过低的样本数,然后进行one-hot编码去处理特征。基于神经网络的模型,本文采用首先计算特征的频次,并建立特征频次字典,将字符型特征转变成整型特征,然后将转化后的特征进行离差标准化,使每个特征的特征值范围在[0,1]之间。(2)递归神经网络虽然已经应用于广告点击率的预估,但是递归神经网络模型采用梯度下降,在趋近最小值时,可能会出现梯度爆发或消失,从而影响预估效果。本文采用基于LSTM(long short term memory)改进的递归神经网络预估广告点击率,利用LSTM去修正RNN,来防止梯度的爆发或消失。实验结果表明基于LSTM改进的递归神经网络模型在预估广告点击率方面取得了较好的效果。(3)本文采用python语言编写逻辑回归模型,随机森林模型、BP(Back Propagation)神经网络模型、递归神经网络模型和基于LSTM(Area Under roc Curve)改进的递归神经网络模型。并分别采用sigmoid函数和ReLu函数来训练递归神经网络,实验证明ReLu函数收敛得更快,模型预估的效果更好。模型评估方法采用logloss方法,与AUC相比logloss更能反映模型预估广告点击率的准确性。
[Abstract]:With the development of the Internet, online advertising has been developed rapidly in the past few decades. As an important research content of computational advertising, ad click rate has been paid more and more attention by people and with the help of machine learning. It is the main method to estimate the ad click rate according to the historical data. With the accurate estimate of the ad click rate, the advertisement can be placed more accurately and the real click rate can be improved. Although the linear model can be used to estimate the click rate of advertising, the learning ability of the linear model is limited, and it is unable to learn more and more effectively for more and more data features. And in the process of learning it is easy to over-fit the situation. The model based on neural network algorithm uses nonlinear excitation function and multi-layer node structure to better learn the complex relationship between a large number of nonlinear features. In order to improve the prediction ability of the model, the recurrent neural network is a ring structure in the network. Neural network which can store the output of the previous time of the neuron and has a strong ability to optimize the computation. The main work of this paper includes the following three aspects: 1) this paper deals with the corresponding characteristics of different models. The logical regression model uses splicing dominant feature combination to extract hidden user attributes and then hash map. The original eigenvalues of different types are converted to the same type of eigenvalues. The stochastic forest model adopts the establishment of feature dictionaries and the number of samples with low filtering frequency. Based on the neural network model, this paper first calculates the frequency of the features, and establishes the feature frequency dictionary to transform the character features into integral features. The converted features are then standardized for deviation, so that the range of eigenvalues for each feature is in the range of. [Although the recursive neural network has been applied to the prediction of ad click rate, the recursive neural network model adopts gradient descent, and the gradient may erupt or disappear when the minimum value is approached. In this paper, an improved recursive neural network based on LSTM(long short term memory is used to estimate the ad click rate. Use LSTM to fix RNN. The experimental results show that the improved recursive neural network model based on LSTM is effective in predicting the click rate of advertisements. This paper uses python language to write the logical regression model. Random forest model (BPP-Back Propagation) neural network model. Recursive neural network model and based on LSTM(Area Under roc current). The improved recursive neural network model and the sigmoid function and ReLu function are used to train the recurrent neural network. Experimental results show that the ReLu function converges faster and the effect of model prediction is better. Logloss method is used to evaluate the model. Compared with AUC, logloss can better reflect the accuracy of the model in predicting ad click rate.
【学位授予单位】:浙江理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:F713.8;TP183

【引证文献】

相关硕士学位论文 前1条

1 朱静阳;基于LDBN的心脏病发病风险模型研究[D];郑州大学;2017年



本文编号:1442526

资料下载
论文发表

本文链接:https://www.wllwen.com/jingjilunwen/guojimaoyilunwen/1442526.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户0e0b4***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com