当前位置:主页 > 科技论文 > 软件论文 >

改进型频繁1-项集生成方法及实验研究

发布时间:2018-01-24 03:42

  本文关键词: 数据挖掘 关联分析 频繁1-项集 增量式数据模式 节省时间 出处:《吉林大学》2017年硕士论文 论文类型:学位论文


【摘要】:从人类文明诞生起,人类对数据的收集以及相应的分析都在进行着。比如古人对天气的总结以及预测都是在人们日常生活中对天气变化情况也就是天气变化的数据的收集以及分析得到的结论;还有农作物的种植时间、种植方法等也是人们根据历年来种植的情况以及经验总结出来的方式方法,这也是对数据的收集以及分析的过程;其余的还有建筑、水利、商业等等,自古以来人们对数据的收集以及使用体现在生活的方方面面。在互联网出现之前,人们对数据的使用大都局限在一个区域范围内,区域内的天气、地域性的农作物以及适应气候的建筑风格等。伴随着互联网的出现与发展,伴随着世界信息一体化的形成,人们可以更方便的得到更多更有用的数据,这也就意味着更多有价值的东西会从数据中得到,也就是现在的数据挖掘。数据挖掘意在发现数据中的价值,主要有聚类分析、分类分析、关联分析、预测以及偏差分析等。其中关联分析是对数据中相关的项进行总结,从而进行其他的分析工作,也是与本文相关的方向。为了方便关联规则分析,出现了很多关联分析的算法,意在找到数据中关联性强的数据项。在大多数关联规则算法中都需要先生成频繁1-项集,随后在生成的频繁1-项集的基础上继续进行后续的工作。对于只用进行一次的关联规则分析,频繁1-项集的生成需要对数据库进行一次扫描,但是在数据不断增加、关联分析持续进行的情况下,每次关联分析时都要在生成频繁1-项集时扫描数据库,这也就意味着后续的关联分析中需要对旧的数据进行重复的扫描工作,这必将浪费很多时间。本文就针对这一情况对频繁1-项集的生成进行了改进,以达到节省不必要的数据库读取扫描时间。对于增量式数据情况下的频繁1-项集生成的改进主要是通过在生成频繁1-项集过程中的候选1-项集的数据转存来实现,运用的原理是数据条目数要远远大于数据项种类数,从而节省了后续的关联规则分析时生成频繁1-项集的时间,从而节省整个算法的工作时间。
[Abstract]:Since the birth of human civilization. Human data collection and corresponding analysis are going on. For example, the ancient summary and prediction of the weather is the collection and analysis of weather change in people's daily life, that is, weather change data. Conclusions reached; There is also crop planting time, planting methods and so on are also based on the past years of cultivation and experience summed up the way and methods, which is also the process of data collection and analysis; The rest are buildings, water conservancy, commerce and so on, and the collection and use of data since ancient times has been reflected in all aspects of life. Before the advent of the Internet. People's use of data is limited to a regional scope, the regional weather, regional crops and climate adaptation of the architectural style. With the emergence and development of the Internet. With the formation of information integration in the world, people can easily get more and more useful data, which means that more valuable things will be obtained from the data. That is, the current data mining. Data mining is intended to find the value of data, mainly cluster analysis, classification analysis, association analysis. Prediction and deviation analysis. Among them, association analysis is to summarize the related items in the data, so as to carry out other analysis work, which is also related to the direction of this paper. In order to facilitate the analysis of association rules. There are many association analysis algorithms to find data items with strong correlation in data. In most association rules algorithms we need to generate frequent 1-item sets first. Then we continue to do the following work on the basis of the generated frequent 1-itemsets. For only one association rule analysis, the frequent 1-itemsets generation needs to scan the database once. However, when the data is increasing and the association analysis is ongoing, the database must be scanned at the time of generating frequent 1-item sets each time the association analysis is carried out. This means that the follow-up association analysis needs to scan the old data repeatedly, which will waste a lot of time. In this paper, we improve the generation of frequent 1-item sets. In order to save unnecessary scanning time of database reading. The improvement of frequent 1itemsets generation in the case of incremental data is mainly through the data transfer of candidate 1-itemsets in the process of generating frequent 1-itemsets. To make it happen. The principle is that the number of data items is much larger than the number of data items, thus saving the time of generating frequent 1-item sets in subsequent association rule analysis, thus saving the working time of the whole algorithm.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13

【参考文献】

相关期刊论文 前10条

1 宋余庆,朱玉全,孙志挥,陈耿;基于FP-Tree的最大频繁项目集挖掘及更新算法[J];软件学报;2003年09期

2 吕佳;;Web日志挖掘技术应用研究[J];重庆师范大学学报(自然科学版);2006年04期

3 安建成;刘超慧;;频繁项集快速挖掘及更新算法[J];微电子学与计算机;2008年06期

4 余平;汪继文;;Apriori算法的一种改进研究[J];廊坊师范学院学报(自然科学版);2009年04期

5 钱秀槟;李锦川;方星;;信息安全事件定位中的Web日志分析方法[J];信息网络安全;2010年06期

6 吕艳华;卫荣娟;;基于知识获取障碍分析的学术服务对策研究[J];中华医学图书情报杂志;2011年10期

7 贾l,

本文编号:1459081


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1459081.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户22224***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com