当前位置:主页 > 科技论文 > 搜索引擎论文 >

元搜索关键技术的研究及实现

发布时间:2018-11-05 12:04
【摘要】:通过对主流的搜索引擎技术及相关产品的分析不难看出,目前的搜索引擎在一定程度上满足了用户高效地检索有效信息的需求。同时,从现有的搜索引擎产品的用户满意度的调查不难发现,目前的搜索引擎技术还存在诸多不足。其主要体现在两个方面:第一、搜索结果的全面性,即通常所说的查全率问题。第二、搜索结果的有效性,即通常所说的查准率。 针对目前独立搜索引擎技术中查准率与查全率两大问题的解决方法多种多样,其中具有代表性的为元搜索技术。该技术在一定程度上解决了独立搜索引擎的不足,但就元搜索结果而言,仍然存在诸多需要完善的方面。 目前理论界针对元搜索的研究同样围绕这三部分展开。1)对于用户输入处理方面而言,其存在的主要问题集中于用户对查询目标的模型性,用户输入内容的多义性等问题。为解决上述问题,出现了关键词提示,分词技术,基于Agent技术的用户兴趣模型的建立,基于知网的歧义消除及关键词选择以及基于本体理论的用户输入处理等;2)候选引擎调度方面,针对候选搜索引擎搜索调度方面存在的候选搜索引擎选择以及调度策略等问题,提出了诸如粗略信息代表法,详细信息代表法,定量算法,静态学习法,动态学习法,动静态结合的混合学习法以及基于本体的个性化调度方法等;3)返回结果处理方面而言,主要存在两大问题,即结果的去重与排序。针对结果的去重问题,目前的相关的理论和方法主要有基于URL,基于文档标题,基于文档摘要或三个方面相结合等处理方法;而针对结果的排序,主要基于位置信息的排序方法,基于相关度的排序方法以及基于本体的个性化排序方法等。 本文以元搜索引擎的高查准率、高查全率以及系统的快速反应为基本目标,以元搜索系统的三大主要组成部分为依据,对元搜索引擎现存的相关问题进行分析与研究:1)用户输入处理部分,采用了正向分词与逆向分词相结合的方法,保证用户关键词集合能完全反应用户的搜索意图,并采用用户长期兴趣树与短期兴趣树相结合的方式,既可保证用户兴趣类别的时效性,又使得用户的兴趣类别搜索的性能。2)候选搜索引擎调度方面,本文采用用户兴趣类别与用户检索行为分析相结合的方式来进行搜索引擎的选择,力求挑选出最可能提供用户搜索结果的若干候选搜索引擎,并且采用内存缓冲与本地数据库缓冲相结合的双缓冲机制,以提高系统对用户查询请求的响应时间。3)候选搜索引擎返回的结果方面,在结果去重时,并非单纯地过滤各候选搜索引擎的返回结果,而是在去重过程当中便对返回结果赋予相应的权值,以便提高相应结果的排序得分,进而提高相应的返回结果在所有返回结果中的排名。 通过对元搜索各重要组成部分的改进,系统无论是结果的查准率以及查全率均有所提高。GD-FNN与用户兴趣索引树的结合,使得用户无论是初次使用本系统还是已经多次使用,其搜索结果的满意度均有所改善;另一方面,双缓冲调度技术的应用,使得系统的单次搜索时间缩短至十毫秒级。
[Abstract]:Through the analysis of the mainstream search engine technology and related products, it is not easy to see that the search engine at present meets the need of efficient retrieval of effective information by users. At the same time, the research on the user satisfaction of the existing search engine products is not easy to find, and the present search engine technology still has many shortcomings. It is mainly embodied in two aspects: the first, the comprehensiveness of the search results, that is, what is usually referred to as the full rate problem. Second, the validity of the search results, that is, the check rate generally referred to. According to the present independent search engine technology, the method of solving the two big problems of checking rate and full rate is diverse, among which, it is representative of meta-search. Technology. This technology solves the shortage of independent search engines to some extent, but there are still many needs to be improved in terms of meta-search results. In terms of user input processing, the main problems in the research of meta-search are focused on user's model of query object and user input content. In order to solve the above problems, keyword prompt, word segmentation technology, agent-based user interest model establishment, knowledge-based disambiguation and keyword selection and ontology-based user input processing are presented. In order to solve the problems of candidate search engine selection and scheduling strategy, such as rough information representative method, detailed information representative method, quantitative algorithm and static learning method are proposed in this paper. Dynamic learning method, dynamic static combined learning method and ontology-based personalized scheduling method, etc. The related theories and methods are mainly based on URL, document title, document abstract or three aspects based on URL, which is based on location information. Ordering method, ranking method based on correlation degree and personalization based on ontology In this paper, based on the three major components of meta-search system, the existing problems of meta-search engine are analyzed and studied. the user input processing part adopts a method combining the forward word segmentation and the reverse word segmentation so as to ensure that the user keyword set can fully react to the search intention of the user and adopts the mode of combining the long-term interest tree of the user and the short-term interest tree, so that the user can not only guarantee the user, in the aspect of candidate search engine scheduling, the article adopts a way of combining user interest category and user search behavior analysis to select the search engine so as to select the most likely user search result. a plurality of candidate search engines and a double buffering mechanism combining the memory buffer and the local database buffer to improve the response time of the system to the user query request. 3) the candidate search engine returns the result aspect, the candidate search engine is not simply filtered when the result is de-heavy, the return result of the cable engine is returned, but the corresponding weight value is given to the return result during the de-heavy process so as to improve the ranking score of the corresponding result and further improve the corresponding return result in all Returns the ranking in the results. By improving the meta-search for each important component, the system determines whether the results are accurate The combination of GD-FNN and the user's interest index tree has improved the satisfaction of the users whether to use the system for the first time or have been used multiple times, and the application of the double buffering scheduling technology makes the system single time
【学位授予单位】:南京农业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3

【参考文献】

相关期刊论文 前10条

1 李红梅;丁振国;周水生;周利华;;元搜索引擎结果合成算法[J];北京邮电大学学报;2008年05期

2 刘续;王q,

本文编号:2312030


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2312030.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a793c***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com