基于关联规则的查询扩展技术研究
[Abstract]:With the rapid increase of network information, it is still difficult to find the exact information that people want through search engine, and the query rate is not high and the precision rate is low, which becomes the urgent problem that search engine needs to solve. In order to solve this problem, this paper studies the query extension technology based on association rules according to the viewpoint of Van Rijsbergen scholars to improve the retrieval ability by modifying the original query. The main contents are as follows: 1. Firstly, the basic contents of this paper: data mining, association rules, query expansion, detailed introduction, and analysis of the existing query extension technology based on association rules. Pointing out the advantages and disadvantages, aiming at the common shortcomings: the existing query expansion algorithms based on association rules do not pay attention to the mining efficiency of association rules mining algorithms and whether the mining algorithms are suitable or not. 2. Aiming at the above problems, this paper proposes a query expansion algorithm based on maximum frequent itemset mining for the first time, which adopts the query technology based on vector space model. The first retrieval of n documents is partitioned, the processed participle is represented by vertical data format, the support degree of item set is obtained by the method of intersection, and the data structure of set enumeration tree is adopted at the same time. A certain pruning strategy is used to mine the maximum frequent itemsets, and the extended lexicon is obtained, and the extended words are combined with the initial query words for secondary retrieval. Experimental results show that compared with the previous algorithms, the efficiency of the algorithm is improved. 3. The query expansion algorithm based on maximum frequent itemsets mining is proposed in this paper. It is based on the assumption that the importance of the original query word and the extension word is the same, and the weight of the original query word and the extended word is not considered. At the same time, the maximal frequent itemsets are mined, and the support degree information of some frequent items is lost. To solve the above problems, this paper proposes a query expansion algorithm based on frequently closed itemsets. The algorithm adopts HT-struct link structure, adopts depth-first search strategy, combines certain pruning technology, mining frequent closed itemsets, obtains association rules, and obtains extended lexicon. At the same time, the algorithm measures the weight of extended words according to the confidence degree of the rules. Experiments show that the efficiency of the algorithm is improved and the algorithm is feasible.
【学位授予单位】:解放军信息工程大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP311.13
【参考文献】
相关期刊论文 前5条
1 黄美璇;;基于主题发现的舆情分析系统的设计与实现[J];北京联合大学学报(自然科学版);2012年01期
2 黄名选;严小卫;张师超;;查询扩展技术进展与展望[J];计算机应用与软件;2007年11期
3 崔航,文继荣,李敏强;基于用户日志的查询扩展统计模型[J];软件学报;2003年09期
4 黄名选;严小卫;张师超;;基于矩阵加权关联规则挖掘的伪相关反馈查询扩展[J];软件学报;2009年07期
5 缪裕青;金波;陈国良;;HTCLOSE:快速挖掘微阵列数据集中的频繁闭合模式[J];小型微型计算机系统;2008年02期
相关博士学位论文 前2条
1 缪裕青;关联规则挖掘及其在基因表达数据中的应用[D];中国科学技术大学;2007年
2 米杨;基于顶级本体整合的医学领域语义标注研究[D];吉林大学;2012年
相关硕士学位论文 前7条
1 周剑烽;基于语义本体的信息检索方法的研究[D];杭州电子科技大学;2010年
2 唐蓉;搜索引擎重复网页检测技术研究[D];重庆理工大学;2011年
3 谭义红;关联规则挖掘及其在概念检索中的应用研究[D];湖南大学;2003年
4 薛云;Internet上元搜索引擎的研究与设计[D];太原理工大学;2003年
5 朱冀;以概念分层为背景知识的关联规则挖掘算法的分析[D];电子科技大学;2004年
6 黄名选;基于完全加权关联规则挖掘的查询扩展研究[D];广西师范大学;2007年
7 彭程;关联规则在搜索引擎中的应用及研究[D];西安理工大学;2010年
本文编号:2213026
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2213026.html