当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于垂直领域的分布式搜索多样性的研究与实现

发布时间:2018-07-17 06:55
【摘要】:自21世纪以来,信息技术和计算机网络取得了重大的进步,海量数据和信息过载使得用户要从如此多的资讯中检索出感兴趣的内容变得越来越困难。而随着信息存储的压力不断增大,分布式系统也应运而生,与此同时也给传统的检索系统和搜索引擎带来了一系列新的挑战。其中一部分的挑战来自于用户搜索要求包含的多样性,这使得检索系统不仅需要快速且准确的把握用户检索的信息来自哪一垂直领域,即满足用户查询多样性;同时也需要考虑信息的正确性以覆盖用户的需求。而将分布式搜索系统与多样性相结合,就成为解决当前诸多挑战的答案。基于现今分布式搜索引擎的结构,本文在垂直领域选择,资源选择以及结果融合三个方面,结合检索信息的多样性,提出了多种可行的算法,为用户提供更具针对性的服务。本文主要的研究工作包括:(1)在垂直领域选择方面,本文提出词向量判断法和扩展词汇排序法的垂直领域选择算法,在扩展查询词的同时,对垂直领域进行关键词提取,并根据两者的相似度进行垂直领域的选择。实验结果表明本文提出的两种算法与之前已有的垂直领域选择方法相比,在准确率和召回率方面有了一定的提升。(2)在资源选择方面,本文提出两种资源库描述方法,LDA主题描述和TF-IDF资源描述法,结合资源描述法提出了资源库选择算法框架,该框架同时结合垂直领域选择结果,对用户输入查询进行资源库选取。实验结果表明,本文提出的资源库选择算法能够有效地应用于真实复杂的网络搜索引擎的分布式环境,并取得优异的效果。(3)在查询结果的融合方面,本文基于垂直领域特性以及查询词多样性的特性,提出了一种基于文档,资源库,垂直领域三个维度的特征计算的结果融合算法框架,该框架使用改进的CORI算法和线性融合算法进行最终的结果融合分数计算。算法体现了查询结果多样性以及准确性,与已有的方法相比,在搜索结果的准确率,召回率以及n DCG值方面取得了不错的表现和较大的提升。本文在上述研究的基础上,验证了本文提出的三个领域算法能够有效地提高系统的正确率,并保证了多样化的反馈效果,证明了系统能够满足用户多角度查询的需求。
[Abstract]:Since the 21st century, great progress has been made in information technology and computer network. Huge amounts of data and information overload make it more and more difficult for users to retrieve the content of interest from so much information. With the increasing pressure of information storage, distributed systems emerge as the times require, at the same time, it also brings a series of new challenges to traditional search systems and search engines. Some of the challenges come from the diversity of user search requirements, which makes the retrieval system not only need to quickly and accurately grasp which vertical domain the user retrieves information from, that is, to satisfy the diversity of user queries; At the same time, we also need to consider the correctness of the information to cover the needs of users. The combination of distributed search system and diversity becomes the answer to many current challenges. Based on the structure of today's distributed search engine, this paper proposes a variety of feasible algorithms to provide users with more targeted services in three aspects: vertical field selection, resource selection and result fusion, combined with the diversity of retrieval information. The main research work of this paper is as follows: (1) in the aspect of vertical field selection, this paper proposes the vertical field selection algorithm of word vector judgment method and extended lexical sorting method. The vertical domain is chosen according to the similarity between the two. The experimental results show that the two algorithms proposed in this paper have improved the accuracy and recall rate compared with the previous vertical domain selection methods. (2) in the aspect of resource selection, In this paper, two resource base description methods, LDA topic description and TF-IDF resource description method, are proposed. Combined with the resource description method, a resource base selection algorithm framework is proposed, which combines the vertical domain selection results to select the resource base for user input queries. Experimental results show that the proposed resource base selection algorithm can be effectively applied to the distributed environment of real and complex network search engines, and achieve excellent results. (3) in the aspect of fusion of query results, Based on the characteristics of vertical domain and query word diversity, this paper proposes a result fusion algorithm framework based on three dimensions of document, resource base and vertical domain. The framework uses improved Cori algorithm and linear fusion algorithm to calculate the final fusion score. The algorithm reflects the diversity and accuracy of query results. Compared with the existing methods, the algorithm has achieved good performance and great improvement in the accuracy of search results, recall rate and n-DCG value. On the basis of the above research, this paper verifies that the three domain algorithms proposed in this paper can effectively improve the accuracy of the system, and ensure a variety of feedback effects. It is proved that the system can meet the needs of users' multi-angle query.
【学位授予单位】:华南理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.3

【相似文献】

相关期刊论文 前1条

1 刘云;美国国家科学基金会的总体目标与战略领域选择[J];科学学与科学技术管理;1999年05期

相关会议论文 前1条

1 邢怀滨;;谈中国原始创新的领域选择[A];新观点新学说学术沙龙文集14:科技创新——科学优先还是技术优先[C];2007年

相关重要报纸文章 前1条

1 胡苇;建议发行新版普通邮票[N];中国集邮报;2009年

相关硕士学位论文 前1条

1 谢一帆;基于垂直领域的分布式搜索多样性的研究与实现[D];华南理工大学;2016年



本文编号:2129494

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2129494.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户41adc***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com