基于SLCS的元搜索去重技术研究
发布时间:2018-01-20 04:51
本文关键词: 网页去重 元搜索引擎 LCS 特征码 出处:《图书情报工作》2010年15期 论文类型:期刊论文
【摘要】:针对元搜索结果中的网页重复问题,把基于最长公共子序列(Longest Common Subsequence,简称LCS)的网页去重方法应用到元搜索引擎的去重中,提出基于SLCS(首字母S表示Summary)的元搜索去重方法。在获得网页文档摘要后,根据查询词在语句中出现的次数和语句长度,计算摘要语句集合中每个语句权重,提取权重最大的语句作为网页摘要特征语句,通过比较摘要特征语句间的LCS,计算出结果网页相似性,以提高元搜索引擎的检索质量,实验表明该方法具有较高的准确率。
[Abstract]:Aiming at the problem of web page repetition in meta search results, the longest Common Subsequence is based on the longest common subsequence. The method of web page de-reduplication based on LCSS is applied to the meta search engine. A new method based on SLCSS (S for Summary-based) is proposed. According to the number and length of the query words in the statement, the weight of each statement in the summary statement set is calculated, and the statement with the largest weight is extracted as the feature statement of the web page summary. By comparing the LCSs among abstract feature statements, the similarity of the result pages is calculated to improve the retrieval quality of the meta search engine. The experimental results show that this method has a high accuracy.
【作者单位】: 河南工业大学信息科学与工程学院;
【分类号】:TP391.3
【正文快照】: 元搜索引擎(Meta-search Engine)将用户的查询请求分发给多个独立的成员搜索引擎,对搜索结果进行融合处理,能够较好地满足用户的查询需求[1]。但是,查询结果中会有一定程度的重复,这种重复严重影响查询结果的质量。因此,如何高效去除元搜索引擎查询结果中的重复网页,是搜索引,
本文编号:1446916
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1446916.html