基于相似性博客推荐技术的研究与应用
发布时间:2018-11-11 15:02
【摘要】:随着Web2.0的应用,博客的传播速度得到了前所未有的发展,使其拥有巨大的信息资源。在数目如此庞大的博客系统中,用户想要找到自己最感兴趣的博客或博文,,同时博主也想使自己的博客得到更高的访问量,就显得非常困难。博客搜索引擎的问世在一定程度上解决了这个问题,但是由于技术上以及对用户要求上的原因,不能真正满足用户的需求。 本文研究了目前常用的推荐算法,并对博主的社会信息和博文信息进行分析,基于现有的技术设计了一种基于相似性的博客推荐算法,从博客的博文和博主的社会信息两方面的相似性来计算研究博客的相似度。本文在算法设计之前先介绍了博客的博文相似性和博主社会信息相似性的概念,并阐述了采用相似性方法的优点。构造了博主社会信息相似度和博文信息相似度的计算公式,并把二者进行了综合来计算总的相似度,对相似性权重值的确定采用线性结合法,并结合参考文献的内容确定其大小。实验部分采用开源爬虫工具(Heritrix)从新浪网上抓取相关的博客作为实验性数据,并对抓取回来的数据进行处理,然后将相关的数据信息存储到数据库中。 对于改进的算法通过两种评价标准进行评估:一种是和文本算法对比准确率,这种方法适合于计算机进行自动测评;另一种是通过人工参与的方法,对推荐的博客与目标博客相似与否进行判定。通过对实验结果进行对比与分析,证明了改进算法的有效性,为博客推荐提供了技术支持。
[Abstract]:With the application of Web2.0, the spreading speed of blog has been developed unprecedented, which makes it have huge information resources. In such a large number of blog systems, it is very difficult for users to find the blog or blog they are most interested in, and for bloggers to get more visitors to their blogs. The emergence of blog search engine solves this problem to some extent, but because of the technical and user requirements, it can not really meet the needs of users. In this paper, the commonly used recommendation algorithms are studied, and the social information and blog information of bloggers are analyzed. A blog recommendation algorithm based on similarity is designed based on existing technologies. The similarity of blog is calculated from the similarity of blog posts and social information of bloggers. Before the algorithm is designed, this paper introduces the concepts of blog similarity and social information similarity of bloggers, and expounds the advantages of using similarity method. In this paper, the formulas for calculating the similarity of social information and information of blog posts are constructed, and the total similarity is calculated by synthesizing them. The method of linear combination is used to determine the similarity weight. And combined with the content of reference to determine its size. In the experiment part, the open source crawler tool (Heritrix) is used to capture the relevant blog data from Sina.com as experimental data, and then the relevant data information is stored in the database. The improved algorithm is evaluated by two evaluation criteria: one is to compare the accuracy with the text algorithm, this method is suitable for computer automatic evaluation; The other is to judge whether the recommended blog is similar to the target blog by the method of artificial participation. The comparison and analysis of the experimental results prove the effectiveness of the improved algorithm and provide technical support for blog recommendation.
【学位授予单位】:内蒙古科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
[Abstract]:With the application of Web2.0, the spreading speed of blog has been developed unprecedented, which makes it have huge information resources. In such a large number of blog systems, it is very difficult for users to find the blog or blog they are most interested in, and for bloggers to get more visitors to their blogs. The emergence of blog search engine solves this problem to some extent, but because of the technical and user requirements, it can not really meet the needs of users. In this paper, the commonly used recommendation algorithms are studied, and the social information and blog information of bloggers are analyzed. A blog recommendation algorithm based on similarity is designed based on existing technologies. The similarity of blog is calculated from the similarity of blog posts and social information of bloggers. Before the algorithm is designed, this paper introduces the concepts of blog similarity and social information similarity of bloggers, and expounds the advantages of using similarity method. In this paper, the formulas for calculating the similarity of social information and information of blog posts are constructed, and the total similarity is calculated by synthesizing them. The method of linear combination is used to determine the similarity weight. And combined with the content of reference to determine its size. In the experiment part, the open source crawler tool (Heritrix) is used to capture the relevant blog data from Sina.com as experimental data, and then the relevant data information is stored in the database. The improved algorithm is evaluated by two evaluation criteria: one is to compare the accuracy with the text algorithm, this method is suitable for computer automatic evaluation; The other is to judge whether the recommended blog is similar to the target blog by the method of artificial participation. The comparison and analysis of the experimental results prove the effectiveness of the improved algorithm and provide technical support for blog recommendation.
【学位授予单位】:内蒙古科技大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 杨丹;曹俊;;基于Web2.0的社会性标签推荐系统[J];重庆工学院学报(自然科学版);2008年07期
2 唐远洋,黄尔嘉;知识挖掘技术与网络教育资源的组织[J];电化教育研究;2003年06期
3 陈春明;徐义峰;;协同过滤算法中一种改进的相似性计算方法[J];桂林电子科技大学学报;2009年03期
4 韩家炜,孟小峰,王静,李盛恩;Web挖掘研究[J];计算机研究与发展;2001年04期
5 李晓明,朱家稷,闫宏飞;互联网上主题信息的一种收集与处理模型及其应用[J];计算机研究与发展;2003年12期
6 李峰;李军怀;王瑞林;张t
本文编号:2325208
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2325208.html