基于分类技术的个性化检索系统的研究与设计
[Abstract]:With the rapid development of Internet and network information technology, the network resources increase exponentially. The query results of traditional general search engine only depend on the query keywords, but in fact, even if the same query words, Different users may query for different purposes, and the desired return results will vary from person to person. In view of this situation, people urgently need a search tool to provide more accurate query results according to individual characteristics. In this paper, a user-centered personalized search engine based on classification is proposed. Based on the thorough analysis of the relevant technologies of personalized information retrieval, this paper studies the common technologies of personalized search engine and the main technology of understanding the purpose of user search in the search engine. According to the user's browsing and query log, the model of retrieval system is established. This paper introduces the automatic text classification, presents several common text representation models, and makes use of WEKA and LibSVM to classify the text automatically. Based on text classification, a sorting algorithm is proposed, in which as many categories as possible can be displayed in the retrieval results, so that users of as many different categories as possible can find the information of the corresponding subject categories. At the same time, according to the user behavior characteristics, that is, the user's click rate of each topic category and the average visit time of each topic category web page, by modifying the lucene scoring field, we can change the lucene's own ranking score on the documents. It is proved by experiments that different result pages can be retrieved when users with different interests query the same words after considering the behavior characteristics of users. Because a large part of the search keywords are repeated, 20% of the search terms account for 80% of the total search times according to the law of 2 / 8. When the user submits a query consisting of a set of keywords, the system determines whether the corresponding record of the query exists in the cache, and if not, submits the query statement to the searcher. The synthetic document number sequence of the result returned by the searcher is stored in a file and the offset value of the stored sequence in the file is saved in the cache. If it already exists, the offset of the stored record is obtained from Cache. Then the design and implementation of the prototype of the system is given. Firstly, the complete architecture of the system is given, and then several main modules, such as retrieval module, result ranking module, query cache module, etc., are described in detail, and several main data structures in the system are analyzed. Finally, the system is tested and analyzed, and the feasibility is verified. Finally, the paper summarizes the work of this paper and looks forward to the next work plan. At the same time, some defects of the system are pointed out, and the improvement method of the whole system is put forward.
【学位授予单位】:武汉理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前5条
1 李巍巍;;全文检索引擎工具包Lucene的结构与索引原理的研究[J];才智;2008年09期
2 赵银春,付关友,朱征宇;基于Web浏览内容和行为相结合的用户兴趣挖掘[J];计算机工程;2005年12期
3 原福永;梁顺攀;;元搜索引擎的现状与发展[J];计算机工程与设计;2005年12期
4 吴小兰;汪琪;;元搜索引擎研究综述[J];图书情报工作;2009年09期
5 门凤超;濮德敏;王东菊;;论元搜索引擎的实现技术与发展趋势[J];现代情报;2008年07期
相关硕士学位论文 前10条
1 吴代文;基于Lucene的二次全文检索系统设计与实现[D];西安电子科技大学;2009年
2 黄卫平;个性化搜索引擎的研究与实现[D];武汉理工大学;2011年
3 蔺继国;基于点击数据分析的个性化搜索引擎研究[D];国防科学技术大学;2010年
4 苏力华;基于向量空间模型的文本分类技术研究[D];西安电子科技大学;2006年
5 霍长青;个性化元搜索引擎研究与设计[D];山东科技大学;2006年
6 庞剑锋;基于向量空间模型的自反馈的文本分类系统的研究与实现[D];中国科学院研究生院(计算技术研究所);2001年
7 邹汉斌;支持向量机在文本分类中的应用[D];江南大学;2006年
8 董梅;文本内容的信息过滤技术研究[D];合肥工业大学;2006年
9 丁琼;基于向量空间模型的文本自动分类系统的研究与实现[D];同济大学;2007年
10 王小燕;文本分类相关技术与应用研究[D];西北大学;2007年
,本文编号:2120952
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2120952.html