当前位置:主页 > 管理论文 > 移动网络论文 >

基于多特征信息融合的WEB广告聚类方法研究

发布时间:2018-04-20 14:51

  本文选题:Web广告 + 多特征 ; 参考:《哈尔滨工业大学》2014年硕士论文


【摘要】:伴随着互联网的快速发展,Web广告已成为网络服务提供商盈利的重要途径,同时也是许多传统行业宣传自身品牌和产品的有效渠道。海量Web广告数据中隐藏着高价值的信息和知识,因此如何对其进行有效的挖掘已经成为许多互联网应用的关键问题。在Web广告数据挖掘中,聚类分析是一项重要的基础技术,不仅可以用于分析竞争对手,并且能够辅助政府以及评估机构对经济发展状况进行评估和预测。Web广告数据包含多种特征,但每一种特征都不能全面描述Web广告对象。通过多种特征的融合,将能够全面描述Web广告对象。因此,本课题研究了基于多特征信息融合的Web广告聚类方法。主要完成以下工作:(1)分析Web广告特点,搜集、构建相关数据集。研究了面向Web广告数据的特征提取方法,实现了一种基于模糊匹配的文本特征提取方法和四种图像特征提取方法;(2)Web广告数据的特征空间有高维稀疏的特点,而决定两个簇分离的往往是极少数特征。为了区分这些极少数特征的重要程度,本文在EW-kmeans的基础上改进了目标函数,综合考虑了簇间距离和簇内距离对聚类效果的影响,提出了基于鉴别子空间的三阶张量加权k-means方法(Dkmeans),并给出相关理论证明。实验结果表明:与最新的相关聚类算法相比,Dkmeans算法在6个公开数据集上均取得了更好的聚类效果;(3)针对Web广告中的不同特征,搭配不同组合进行融合实验。通过实验,发现不同组合的特征融合,对Web广告聚类效果均有不同程度的提高。其中,组合全部特征融合,可以得到最好的聚类效果,从而验证了多特征融合可以提高Web广告的聚类效果。
[Abstract]:With the rapid development of the Internet, Web advertising has become an important way for Internet service providers to make profits, and it is also an effective channel for many traditional industries to propagate their own brands and products. Huge amount of Web advertising data hides high value information and knowledge, so how to mine it effectively has become the key problem of many Internet applications. In Web advertising data mining, clustering analysis is an important basic technology, not only can be used to analyze competitors, And it can assist the government and evaluation agencies to evaluate and predict the economic development. The web advertising packet contains many features, but each feature can not fully describe the object of Web advertising. Through the fusion of various features, it will be able to describe the Web advertising object in a comprehensive way. Therefore, this paper studies the Web advertising clustering method based on multi-feature information fusion. Analyze the characteristics of Web advertising, collect and construct related data sets. In this paper, the feature extraction method for Web advertising data is studied, and a text feature extraction method based on fuzzy matching and four image feature extraction methods are implemented. The separation of the two clusters is often determined by a very small number of features. In order to distinguish the importance of these few features, the objective function is improved on the basis of EW-kmeans, and the influence of the distance between clusters and within clusters on the clustering effect is considered synthetically. In this paper, a third order Zhang Liang weighted k-means method based on discriminant subspace is proposed and the relevant theoretical proof is given. The experimental results show that compared with the latest correlation clustering algorithm, the DK means algorithm achieves a better clustering effect on 6 open datasets. Through experiments, it is found that the feature fusion of different combinations can improve the clustering effect of Web advertising to varying degrees. Among them, the best clustering effect can be obtained by combining all features fusion, which verifies that multi-feature fusion can improve the clustering effect of Web advertising.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.09;TP391.41;TP391.1

【参考文献】

相关期刊论文 前4条

1 韩普;王东波;刘艳云;苏新宁;;词性对中英文文本聚类的影响研究[J];中文信息学报;2013年02期

2 高燕;;关键词自动标引方法综述[J];电子世界;2012年06期

3 周杨;苗夺谦;岳晓冬;;基于自适应权重的粗糙K均值聚类算法[J];计算机科学;2011年06期

4 奉国和;郑伟;;国内中文自动分词技术研究综述[J];图书情报工作;2011年02期

相关硕士学位论文 前1条

1 楼佳;中文文本聚类的评价与改进研究[D];杭州电子科技大学;2009年



本文编号:1778216

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1778216.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户3b288***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com