基于用户日志分析的搜索引擎排序算法的设计与实现
[Abstract]:With the rapid development of the Internet, how to find effective data from mass information becomes more and more important. Search engine provides users with high-quality query service interface by crawling and organizing the information in the network. Its appearance makes the acquisition of target information more convenient. Search engine has become an indispensable tool for Internet users to access network resources, but because of the huge amount of information on the Internet, search engines can not return satisfactory results every time: first, when users enter a query, The search engine will return a large number of related results, while the results most concerned by the user are not displayed in the front or most prominent position; Secondly, because the users have different understanding of the search engine, most users can not express the retrieval idea accurately through the retrieval request, which leads to the inaccuracy of the search results. Therefore, it is important to understand the user's intention through the search behavior to improve the accuracy of search engine results ranking. Based on the statistical analysis of search engine query log, this paper finds out the general rules of user access by the behavior of a large number of users, and then optimizes the sorting algorithm of web pages to guide the final result ranking. Improve the accuracy of search engine results sorting. This paper mainly includes two aspects: (1) analyzing search engine user query log. This paper studies the characteristics of search logs and their relationships, summarizes some basic behavior rules of Chinese search engine users, and finds out the changing trend of search behavior of Chinese search engine users according to the analysis of search logs in different periods. It provides the foundation for user behavior analysis of search engine in the future. (2) optimize the original sorting algorithm of Lucene. The original algorithm is a TF-IDF algorithm based on vector space model. The algorithm only pays attention to the frequency of keywords and the matching degree of documents, and does not consider the characteristics of web pages. A web page ranking algorithm based on word frequency matching and web page characteristics is designed. According to a large number of user query behavior logs, the user search behavior trend is studied, and the sorting factor of user recognition is added to the original sorting algorithm. According to the need of search engine, the weight coefficient of this factor can be adjusted to optimize the ranking of web pages. This can not only guarantee the correlation and matching degree of search results, but also make the ranking of the returned results more in line with the users' needs. The search engine system designed in this paper improves the sorting algorithm by boost factor, and makes a comparative analysis of the results of the original sorting algorithm and the optimized post-sorting algorithm combined with user feedback information. The results show that the optimized post-sorting algorithm can improve the order of query return results and provide a reference for future research on search engine users' query intention.
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前9条
1 王建勇,单松巍,雷鸣,谢正茂,李晓明;海量Web搜索引擎系统中用户行为的分布特征及其启示[J];中国科学E辑:技术科学;2001年04期
2 王继民,彭波;搜索引擎用户访问量模型[J];计算机工程与应用;2004年25期
3 陈红涛;杨放春;陈磊;;基于大规模中文搜索引擎的搜索日志挖掘[J];计算机应用研究;2008年06期
4 李璐;江葆红;孙红红;;如何提高文献信息检索中的查全率与查准率[J];科技文献信息管理;2010年01期
5 余慧佳;刘奕群;张敏;茹立云;马少平;;基于大规模日志分析的搜索引擎用户行为分析[J];中文信息学报;2007年01期
6 岑荣伟;刘奕群;张敏;茹立云;马少平;;基于日志挖掘的搜索引擎用户行为分析[J];中文信息学报;2010年03期
7 詹圣君;邵雄凯;刘建舟;;一种考虑用户行为的改进N—PageRank算法[J];计算机技术与发展;2011年08期
8 陈勇;张汉国;成筠;;基于Lucene的全文搜索引擎[J];现代计算机(专业版);2009年11期
9 张贤;周娅;;基于Lucene网页排序算法的改进[J];计算机系统应用;2009年02期
相关硕士学位论文 前8条
1 杨晶晶;基于用户隐性反馈的信息觅食模型研究[D];北京邮电大学;2011年
2 王宇;基于搜索历史的用户兴趣建模[D];复旦大学;2011年
3 任丽芸;搜索引擎中文分词技术研究[D];重庆理工大学;2011年
4 王亮;搜索引擎及其相关性排序研究[D];武汉大学;2004年
5 王嘉杰;面向博客领域的垂直搜索引擎的研究与实现[D];北京邮电大学;2009年
6 徐海;基于Lucene垂直搜索引擎的研究与实现[D];西安科技大学;2009年
7 金祖旭;基于用户反馈的搜索引擎排名算法研究[D];复旦大学;2010年
8 王霞;基于WEB浏览的用户行为分析系统的研究与设计[D];北京邮电大学;2010年
,本文编号:2321200
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2321200.html