不确定数据的挖掘算法研究
发布时间:2018-01-23 18:27
本文关键词: 数据挖掘 不确定数据 最大模式 频繁模式 出处:《上海交通大学》2015年硕士论文 论文类型:学位论文
【摘要】:随着信息技术日新月异的发展,在金融、物流以及天体研究等众多领域,时刻都会产生和记录海量的数据。而多数情况下,这些数据都存在着误差或者仅是部分完整的,数据的不确定性导致传统的数据挖掘方法不再适用于不确定数据。本文研究不确定数据的挖掘算法,对不确定数据的频繁模式和最大模式的挖掘进行分析研究,并分别提出新的算法,丰富了数据处理的手段,提高了数据挖掘的效率。频繁模式挖掘是数据挖掘领域的核心问题,本文提出了一种基于垂直结构的不确定数据频繁模式挖掘算法ProEclat。ProEclat采用数据集的垂直格式表示,避免了对数据集的多次扫描,使用两阶段模型的频繁项集判断方式,大幅提高了计算效率。实验证明,ProEclat伸缩性良好,性能优于同类算法。最大模式挖掘是频繁项集挖掘的重要研究分支,本文提出一种基于深度优先的不确定数据最大模式挖掘算法U-GenMax。U-GenMax采用多步回退机制、项排序策略、局部投影等剪枝优化技术,减少了算法运行的时间。实验和分析表明,U-GenMax性能良好,尤其适用于稀疏数据集和支持度较高的稠密数据集。
[Abstract]:With the rapid development of information technology, in many fields such as finance, logistics and celestial research, huge amounts of data are generated and recorded at all times. And in most cases. Because of the uncertainty of data, the traditional data mining method is no longer suitable for uncertain data. In this paper, the mining algorithm of uncertain data is studied. The frequent pattern and maximum pattern mining of uncertain data are analyzed and studied, and new algorithms are proposed, which enrich the means of data processing. Frequent pattern mining is the core problem in the field of data mining. In this paper, a vertical structure based frequent pattern mining algorithm for uncertain data, ProEclat.ProEclat, is proposed, which uses the vertical format of the data set to avoid multiple scans of the data set. The efficiency of calculation is greatly improved by using the frequent itemset judgment method of the two-stage model. The experiments show that ProEclat has good scalability. The maximum pattern mining is an important research branch of frequent itemset mining. In this paper, a depth first based maximum pattern mining algorithm for uncertain data U-GenMax.U-GenMax is proposed, which uses multi-step backstepping mechanism, item sorting strategy, local projection and other pruning optimization techniques. Experiments and analysis show that U-GenMax has good performance, especially for sparse data sets and dense datasets with high support.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP311.13
【参考文献】
相关硕士学位论文 前1条
1 张李一;不确定性数据频繁模式挖掘算法的研究[D];复旦大学;2011年
,本文编号:1457946
本文链接:https://www.wllwen.com/guanlilunwen/wuliuguanlilunwen/1457946.html