基于Slor的个性化搜索引擎设计与实现

发布时间：2018-04-10 19:30

本文选题：个性化 + 用户模型　；参考：《北京化工大学》2012年硕士论文

【摘要】：随着Internet(互联网)技术的飞速发展,存在于互联网中的信息量变得越来越巨大。面对如此巨大的信息,如何更加有效的利用这些信息资源,已经成为人们研究主要课题。存在于互联网中的信息来源广泛,通常会以不同的形式存在。对于这种不同来源、不同形式的海量信息,怎样准确地、快速地查找到自己所需要的信息成为我们在使用互联网的时候面临的一个问题。搜索引擎(Search Engine)的出现,从一定程度上缓解了这个问题对人们的困扰。但是随着生活节奏的不断加快,对于搜索引擎的性能要求不断提升对于搜索效率要求也越来越高,因此如何使用户能够更加快速和准确的检索到自己想要的信息,受到越来越多研究者的关注。个性化搜索引擎就是在这种背景下出现的,它运用个性化相关的技术,实现搜索结果的个性化。本文系统中使用建立用户模型的方法,通过对用户信息的采集,提取用户相关信息的关键词,利用TF-IDF算法,得到用户关键词的权重值,利用关键字和权重值组成的向量来表示用户模型。本文主要的研究内容包括： 1.基于搜索引擎的工作原理、关键技术和工作流程设计了本文系统中的搜索引擎框架。 2.根据Web Crawler(网络爬虫)和Heritrix(爬虫框架)的原理设计了适合本系统的网络爬虫,并在互联网中进行了网页的抓取。 3.基于开源框架Solr工作的原理建立了本文中所需的索引系统和检索系统。 4.建立用户模型,利用Solr搭建了个性化搜索引擎系统,实验证明本文中的个性化搜索引擎系统较高的提升了搜索结果的准确率,并且更加贴近用户的需求。
[Abstract]:With the rapid development of Internet (Internet) technology, the amount of information in the Internet has become more and more huge.In the face of such huge information, how to make more effective use of these information resources has become the main research topic.There are a wide range of sources of information in the Internet, usually in different forms.How to find the information we need accurately and quickly becomes a problem we face when we use the Internet.The emergence of search engine alleviates this problem to a certain extent.However, with the increasing pace of life, the performance requirements of search engines are increasing. Therefore, how to make users can retrieve the information they want more quickly and accurately.More and more researchers pay attention to it.Personalized search engine is under this background, it uses personalized technology to achieve personalized search results.In this paper, we use the method of establishing user model, through collecting user information, extracting the key words of user related information, using TF-IDF algorithm, get the weight value of user keywords.The user model is represented by a vector composed of keywords and weight values.The main contents of this paper are as follows:1.Based on the working principle of search engine, key technology and workflow, the framework of search engine in this paper is designed.2.According to the principles of Web Crawler and Heritrix, a web crawler suitable for this system is designed, and the web pages are crawled in the Internet.3.Based on the working principle of open source framework Solr, the index system and retrieval system are established.4.The user model is established and the personalized search engine system is built by using Solr. The experiments show that the personalized search engine system in this paper improves the accuracy of search results and is more close to the needs of users.
【学位授予单位】：北京化工大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP391.3

【参考文献】