基于时间反馈和分类技术的PageRank算法改进研究
发布时间:2018-06-21 08:07
本文选题:PageRank + 搜索引擎 ; 参考:《北京化工大学》2013年硕士论文
【摘要】:当前的信息时代,互联网快速发展,网络上不断产生大量无序的信息,当用户需要搜寻自己关心的信息时,就需要利用搜索引擎快速而准确的反馈结果。这就对搜索引擎的搜索技术提出了更高的挑战,在这一搜索引擎改进过程中,搜索引擎网页排序算法理所当然的成为搜索引擎改进的关键问题。在搜索引擎产生的初期,传统的搜索引擎排名算法中,PageRank算法和Hits算法是两个经典算法,,它们都是基于网页链接结构的,这些算法也是国内外搜索引擎算法改进的依据和基础,同时也出现了一些行之有效的改进算法。 本文首先阐述了搜索引擎排序算法的研究背景和意义,以及国内外关于搜索引擎的发展现状等,分析了搜索引擎的工作原理与技术,以及搜索引擎网站的评测指标。然后通过分析传统PageRank算法、Hits算法的优势和不足,为本文对PageRank算法的综合改进奠定了基础。 本文的重点内容在于对已有PageRank算法的改进算法进行进一步融合,提出了结合网页分类技术和具有时间反馈因子的PageRank算法的综合改进算法,并依据该算法,改进了PR值的计算公式。并对改进后的算法进行设计验证,将改进前后算法的实验结果进行对比,验证了改进后的算法可以一定程度上提高搜索引擎的查准率和查全率。
[Abstract]:In the current information age, with the rapid development of the Internet, a large amount of disordered information is constantly produced on the Internet. When users need to search for the information they care about, they need to make use of the fast and accurate feedback results from search engines. This poses a higher challenge to search engine search technology. In the process of search engine improvement, search engine web page sorting algorithm becomes the key problem of search engine improvement. In the initial stage of search engine, PageRank algorithm and hits algorithm are two classical algorithms, which are based on the link structure of web pages. These algorithms are also the basis and foundation for the improvement of search engine algorithms at home and abroad. At the same time, there are some effective improved algorithms. This paper first describes the research background and significance of search engine sorting algorithm, as well as the development status of search engine at home and abroad, and analyzes the working principle and technology of search engine, as well as the evaluation index of search engine website. Then, by analyzing the advantages and disadvantages of the traditional PageRank algorithm, it lays a foundation for the comprehensive improvement of the PageRank algorithm in this paper. The emphasis of this paper is to further fuse the existing improved PageRank algorithm, and propose a comprehensive improved PageRank algorithm, which combines the web page classification technology and the PageRank algorithm with time feedback factor, and based on the improved PageRank algorithm, an improved PageRank algorithm based on the PageRank algorithm is proposed. The formula of PR value is improved. The improved algorithm is designed and verified, and the experimental results before and after the improvement are compared to verify that the improved algorithm can improve the precision and recall of search engine to a certain extent.
【学位授予单位】:北京化工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 王继民,陈
本文编号:2047844
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2047844.html