基于聚类的搜索可视化呈现系统的设计与实现
发布时间:2018-03-15 00:14
本文选题:搜索结果聚类 切入点:可视化 出处:《北京邮电大学》2013年硕士论文 论文类型:学位论文
【摘要】:随着信息技术以及互联网的普及,搜索引擎技术得到了快速的发展,传统的搜索引擎根据用户输入的检索关键字为用户提供搜索结果,并根据相关性对得到的搜索结果进行排序。然而,由于自然语言中的词语往往存在二义性,用户所输入的检索词概念相对模糊,导致搜索结果范围相对分散,用户需要花费一定时间才能找到自己真正感兴趣的主题。如何有效处理大量的搜索结果减少用户的检索时间,这促进了元搜索引擎技术以及文本聚类技术的发展。 本论文旨在结合元搜索和文本聚类技术对搜索结果进行改进和增强。元搜索技术是建立在独立搜索引擎上的一种技术,该技术能够针对各个成员搜索引擎的搜索结果进行汇聚和筛选,并将整合的最终结果呈现给用户。搜索结果之间存在着不同程度的差异性,如果利用文本聚类技术对搜索结果进行聚类,以层次化的形式呈现搜索结果,每个聚类结果都有一个标签用以描述其主题和内容,这样可以一定程度上帮助用户进行搜索定位,从而将检索范围和用于检索筛选的时间降低。 本论文的主要内容是设计并实现一个基于聚类的搜索结果可视化工具。为了实现该工具,本文首先基于原有搜索引擎得到的结果,提出一种结合用户行为的搜索结果聚类方法。该方法通过对搜索结果进行二次处理,将具有相似主题的搜索结果归并,以聚类的形式将搜索结果呈现给用户,帮助用户快速定位到自己感兴趣的信息。与此同时,该方法还对用户的访问行为信息加以收集分析,通过迭代的方式不断优化聚类算法。本文结合需求对系统相关方案和总体架构进行说明,包括了搜索结果获取、搜索结果预处理、聚类生成以及用户行为分析等主要模块的设计思想和工作原理。接下来,本文对各个主要模块的实现以及具体交互设计和工作流程进行详细的说明,并展示了最终的聚类效果以及对聚类效果和质量方面的测试结果。最后对本文设计实现的搜索结果聚类可视化工具提出建议和下一步研究方向。
[Abstract]:With the popularity of information technology and the Internet, search engine technology has been rapidly developed. Traditional search engines provide users with search results according to the search keywords entered by users. The search results are sorted according to the correlation. However, because of the ambiguity of the words in the natural language, the concept of the search words entered by the user is relatively fuzzy, which leads to the relative dispersion of the search results. It takes a certain amount of time for users to find their own real topic of interest. How to effectively deal with a large number of search results to reduce the retrieval time of users, which promotes the development of meta-search engine technology and text clustering technology. The purpose of this thesis is to improve and enhance the search results by combining meta-search and text clustering. Meta-search is a technology based on independent search engine. This technology can aggregate and filter the search results of each member search engine, and present the integrated final results to the users. If the text clustering technology is used to cluster the search results and present the search results in a hierarchical form, each result has a label to describe its subject and content, which can help the user to locate the search to a certain extent. As a result, the retrieval range and the time used for retrieving filtering are reduced. The main content of this paper is to design and implement a clustering based search results visualization tool. A search result clustering method combining user behavior is proposed. By quadratic processing of search results, the search results with similar topics are merged and presented to users in the form of clustering. It helps users quickly locate the information they are interested in. At the same time, the method also collects and analyzes the information about users' access behavior. In this paper, the related schemes and the overall architecture of the system are explained, including the search results acquisition, search results preprocessing, and the optimization of the clustering algorithm. The design idea and working principle of the main modules, such as clustering generation and user behavior analysis, are introduced in detail. Then, the realization of the main modules and the specific interactive design and workflow are described in detail. Finally, the final clustering effect and the test results of clustering effect and quality are presented. Finally, some suggestions and further research directions are given for the design and implementation of the search result clustering visualization tool in this paper.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前6条
1 王继成,潘金贵,张福炎;Web文本挖掘技术研究[J];计算机研究与发展;2000年05期
2 王继成,萧嵘,孙正兴,张福炎;Web信息检索研究进展[J];计算机研究与发展;2001年02期
3 徐伟革;;浅析数字图书馆搜索引擎[J];科技情报开发与经济;2010年20期
4 余慧佳;刘奕群;张敏;茹立云;马少平;;基于大规模日志分析的搜索引擎用户行为分析[J];中文信息学报;2007年01期
5 赵华军;钟才明;李文;王睿智;苗夺谦;;网页搜索结果聚类与可视化[J];南京大学学报(自然科学版);2010年05期
6 赵宇;;计算机检索工具的发展与应用[J];中小企业管理与科技(上旬刊);2011年02期
相关硕士学位论文 前4条
1 苏力华;基于向量空间模型的文本分类技术研究[D];西安电子科技大学;2006年
2 周登朋;搜索引擎搜索结果的聚类研究[D];上海交通大学;2007年
3 冯超;K-means聚类算法的研究[D];大连理工大学;2007年
4 冯冰洁;后缀树算法在元搜索引擎中的应用[D];暨南大学;2010年
,本文编号:1613569
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1613569.html