复杂布尔查询下的文档收集打分策略的优化
发布时间:2019-06-22 16:42
【摘要】:虽然布尔查询是信息检索领域中较早提出的一个概念,但是对布尔查询的大量研究主要还是针对布尔操作一致的布尔查询。对于复杂布尔查询,目前并没有太多的相关研究,复杂布尔查询却越来越被频繁地使用(如文本推荐领域)。为了促使这类查询能够被更加高效地执行,提出了一种基于DAAT(document-at-a-time)框架的文档收集打分策略——DCQ(DAAT for complex query)算法,并与著名开源搜索引擎Lucene进行比较实验,查询性能有了显著提升。此外,提出了一套对查询性能的回归预测机制,该机制能比较准确地决策DCQ算法的使用时机。实验表明,结合了性能预测器的复合算法要远优于Lucene当前的文档收集打分算法。
[Abstract]:Although Boolean query is a concept proposed earlier in the field of information retrieval, a lot of research on Boolean query is mainly aimed at Boolean query with consistent Boolean operation. At present, there is not much research on complex Boolean query, but complex Boolean query is used more and more frequently (such as text recommendation field). In order to make this kind of query be executed more efficiently, a document collection scoring strategy based on DAAT (document-at-a-time) framework, DCQ (DAAT for complex query) algorithm, is proposed and compared with Lucene, a famous open source search engine. The query performance has been significantly improved. In addition, a set of regression prediction mechanism for query performance is proposed, which can determine the timing of DCQ algorithm more accurately. The experimental results show that the composite algorithm combined with performance predictors is much better than Lucene's current document collection scoring algorithm.
【作者单位】: 北京大学信息科学技术学院;
【基金】:国家重点基础研究发展计划(973计划) 国家自然科学基金~~
【分类号】:TP391.3
,
本文编号:2504778
[Abstract]:Although Boolean query is a concept proposed earlier in the field of information retrieval, a lot of research on Boolean query is mainly aimed at Boolean query with consistent Boolean operation. At present, there is not much research on complex Boolean query, but complex Boolean query is used more and more frequently (such as text recommendation field). In order to make this kind of query be executed more efficiently, a document collection scoring strategy based on DAAT (document-at-a-time) framework, DCQ (DAAT for complex query) algorithm, is proposed and compared with Lucene, a famous open source search engine. The query performance has been significantly improved. In addition, a set of regression prediction mechanism for query performance is proposed, which can determine the timing of DCQ algorithm more accurately. The experimental results show that the composite algorithm combined with performance predictors is much better than Lucene's current document collection scoring algorithm.
【作者单位】: 北京大学信息科学技术学院;
【基金】:国家重点基础研究发展计划(973计划) 国家自然科学基金~~
【分类号】:TP391.3
,
本文编号:2504778
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2504778.html