关联规则增量挖掘算法研究及应用
发布时间:2018-05-28 21:44
本文选题:关联规则增量挖掘 + FUP ; 参考:《安徽大学》2013年硕士论文
【摘要】:如何从大量数据中获取不同的个性化信息是信息检索领域的研究热点。此方面的研究主要包括元搜索引擎和查询扩展。综合多个搜索引擎返回结果的元搜索引擎其关注点在于为用户提供更多查询结果,查询扩展则通过将用户提交短查询扩展为更多关键词使得查询结果更贴近用户需求。 关联规则挖掘是数据挖掘中的一个重要研究方向,也是查询扩展使用的一种重要方法。本文不仅提出了一种改进的关联规则增量挖掘算法,而且结合元搜索引擎和基于此关联规则的查询扩展,提出个性化元搜索引擎的概念。 本文首先讨论查询扩展使用的关联规则增量挖掘算法。分析在基于FP-Tree的结构上进行增量挖掘时影响挖掘效率的因素以及FUFP中快速更新FP-Tree实现增量挖掘的策略。本文将基于Apriori的典型增量挖掘算法FUP思想引入TD-FP-Growth算法中TD-FP-Tree的快速更新,提出TD-FP-Tree快速更新算法(PFU-TDFP)。算法通过将所有涉及项分类处理,减少扫描原始事务数据库的可能和次数,且当出现由非频繁转为频繁的项时减少重新排序事务中项所要处理的事务数目,并在某些步骤采用并行处理进一步提高效率。实验表明,本文提出的算法不仅可以快速更新TD-FP-Tree,而且在同基于FP-Tree结构的增量挖掘相比可以进一步提升整体挖掘效率。 接着使用PFU-TDFP算法挖掘用户的搜索结果浏览习惯用于查询扩展,使得查询关键词组可以体现用户的行业背景和兴趣倾向,结合元搜索引擎提出个性化元搜索引擎的概念。对元搜索引擎的结果融合提出基于搜索结果的排序、题目和摘要等局部相似度的一种新的结果融合评分模型。最终实现了系统原型,对系统的实验表明,应用PFU-TDFP可以快速更新挖掘用户搜索浏览习惯,本文提出的元搜索引擎结果融合评分公式在P@N方法测试下也会为用户提供更个性化的搜索结果。
[Abstract]:How to obtain different personalized information from a large amount of data is a hot topic in the field of information retrieval. This research mainly includes meta-search engine and query extension. The meta-search engine which synthesizes the results of multiple search engines focuses on providing more query results for users. Query extension extends the short query submitted by users to more keywords to make the query results more close to the users' needs. Association rule mining is an important research direction in data mining, and it is also an important method of query expansion. This paper not only proposes an improved incremental mining algorithm for association rules, but also proposes the concept of personalized meta search engine by combining meta search engine and query extension based on this association rule. This paper first discusses the incremental mining algorithm of association rules used in query extension. This paper analyzes the factors that affect the efficiency of incremental mining based on FP-Tree structure and the strategy of rapidly updating FP-Tree to realize incremental mining in FUFP. In this paper, the idea of FUP, a typical incremental mining algorithm based on Apriori, is introduced into the fast update of TD-FP-Tree in the TD-FP-Growth algorithm, and a fast update algorithm of TD-FP-Tree (PFU-T DFP) is proposed. The algorithm reduces the possibility and frequency of scanning the original transaction database by classifying all the items involved, and reduces the number of transactions to be processed in a reorder transaction when items that become frequent from infrequent to frequent appear. In some steps, parallel processing is used to further improve the efficiency. Experiments show that the proposed algorithm can not only update TD-FP-Tree quickly, but also further improve the overall mining efficiency compared with incremental mining based on FP-Tree structure. Then the PFU-TDFP algorithm is used to mine the search result browsing habits of users for query expansion, so that the key phrases can reflect the users' background and interest tendency, and bring forward the concept of personalized meta search engine combined with meta search engine. A new result fusion scoring model based on the local similarity of search results such as ranking, title and summary is proposed for meta-search engine. Finally, the prototype of the system is implemented, and the experiment results show that PFU-TDFP can quickly update the search and browse habits of mining users. The meta-search engine result fusion scoring formula proposed in this paper will also provide users with more personalized search results under the P@ N test.
【学位授予单位】:安徽大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.13
【参考文献】
相关期刊论文 前10条
1 曾志勇;杨呈智;陶冶;;负载均衡的FP-growth并行算法研究[J];计算机工程与应用;2010年04期
2 邹力濵;张其善;;基于CAN-树的高效关联规则增量挖掘算法[J];计算机工程;2008年03期
3 黄建明;赵文静;王星星;;基于十字链表的Apriori改进算法[J];计算机工程;2009年02期
4 黄名选;张师超;严小卫;;基于查询行为和关联规则的相关反馈查询扩展[J];计算机工程;2009年10期
5 黄名选;冯平;马瑞兴;;基于频繁项集和相关性的局部反馈查询扩展[J];计算机工程;2011年23期
6 赵孝敏;何松华;李贤鹏;尹波;;一种改进的FP-Growth算法及其在业务关联中的应用[J];计算机应用;2008年09期
7 刘华婷;郭仁祥;姜浩;;关联规则挖掘Apriori算法的研究与改进[J];计算机应用与软件;2009年01期
8 李琴琴;汤小春;靳明星;;个性化元搜索关键技术的研究[J];计算机与现代化;2012年03期
9 何波;;基于频繁模式树的分布式关联规则挖掘算法[J];控制与决策;2012年04期
10 董乐;谢红薇;;元搜索引擎中排序融合算法的优化研究[J];计算机应用与软件;2012年10期
,本文编号:1948299
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1948299.html