Hadoop环境下近似概念格的并行构造算法研究
发布时间:2019-03-26 08:40
【摘要】:随着信息技术的迅速发展,全球的数据量正在爆炸性地增长。大数据是指用新处理模式才能挖掘有价值信息的大型而复杂的数据集。大数据处理通常会用到分布式计算。分布式计算分割复杂问题成很小的部分,并分配给多台计算机进行处理,综合计算结果后得到最终结果。分布式计算能极大地缩减程序运行时间。概念格是一个能够高效地分析数据并获取知识的工具,已被用于众多领域,例如机器学习,信息检索和专家系统等。概念格能直观地显示对象和属性之间的关系。现实中,信息系统常带有缺失值,包含缺值信息的形式背景称为不完备形式背景,在此基础上建立的概念格模型称为近似概念格。面对海量数据,传统串行的近似概念格构造算法效率较低。为了解决此问题,通过深度分析近似概念格及不完备信息系统的特征,提出了基于Hadoop环境的MapReduce框架的两种近似概念格的并行构造算法,即并行合并算法和并行增量算法。具体如下:(1)并行合并算法:在MapReduce框架中,首先生成两个概念格,然后把两个概念格进行合并。试验采用LD2011__2014数据集作为实验数据,实验结果表明,该并行算法是可行且高效的。(2)并行增量算法:在经典增量算法基础上提出了并行算法,算法可以直接生成近似概念格,不需要合并。试验采用LD2011__2014数据集作为实验数据,实验结果显示,此算法是可行且高效的。
[Abstract]:With the rapid development of information technology, the amount of data around the world is increasing explosively. Big data is a large and complex data set that uses new processing patterns to mine valuable information. Big data usually uses distributed computing for processing. The complex problem of distributed computing is divided into a small part, which is assigned to several computers for processing, and the final result is obtained by synthesizing the calculated results. Distributed computing can greatly reduce program running time. Concept lattice is a tool which can efficiently analyze data and obtain knowledge. It has been used in many fields, such as machine learning, information retrieval and expert system. Concept lattices can visualize the relationship between objects and attributes. In reality, information systems often have missing values, and the formal background containing missing information is called incomplete formal background. On this basis, the concept lattice model established on this basis is called approximate concept lattice. In the face of massive data, the traditional serial approximate concept lattice construction algorithm is less efficient. In order to solve this problem, by analyzing the characteristics of approximate concept lattices and incomplete information systems in depth, two parallel construction algorithms of approximate concept lattices based on MapReduce framework in Hadoop environment, namely parallel merging algorithm and parallel incremental algorithm, are proposed. The main contents are as follows: (1) parallel merging algorithm: in the framework of MapReduce, two concept lattices are first generated, and then two concept lattices are merged. The experimental results show that the parallel algorithm is feasible and efficient. (2) parallel incremental algorithm: based on the classical incremental algorithm, a parallel algorithm is proposed. The algorithm can generate approximate concept lattice directly without merging. The LD2011__2014 data set is used as the experimental data. The experimental results show that the algorithm is feasible and efficient.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13;TP338.6
本文编号:2447368
[Abstract]:With the rapid development of information technology, the amount of data around the world is increasing explosively. Big data is a large and complex data set that uses new processing patterns to mine valuable information. Big data usually uses distributed computing for processing. The complex problem of distributed computing is divided into a small part, which is assigned to several computers for processing, and the final result is obtained by synthesizing the calculated results. Distributed computing can greatly reduce program running time. Concept lattice is a tool which can efficiently analyze data and obtain knowledge. It has been used in many fields, such as machine learning, information retrieval and expert system. Concept lattices can visualize the relationship between objects and attributes. In reality, information systems often have missing values, and the formal background containing missing information is called incomplete formal background. On this basis, the concept lattice model established on this basis is called approximate concept lattice. In the face of massive data, the traditional serial approximate concept lattice construction algorithm is less efficient. In order to solve this problem, by analyzing the characteristics of approximate concept lattices and incomplete information systems in depth, two parallel construction algorithms of approximate concept lattices based on MapReduce framework in Hadoop environment, namely parallel merging algorithm and parallel incremental algorithm, are proposed. The main contents are as follows: (1) parallel merging algorithm: in the framework of MapReduce, two concept lattices are first generated, and then two concept lattices are merged. The experimental results show that the parallel algorithm is feasible and efficient. (2) parallel incremental algorithm: based on the classical incremental algorithm, a parallel algorithm is proposed. The algorithm can generate approximate concept lattice directly without merging. The LD2011__2014 data set is used as the experimental data. The experimental results show that the algorithm is feasible and efficient.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13;TP338.6
【参考文献】
相关期刊论文 前10条
1 李怡婷;;大数据行业应用现状及发展趋势分析[J];数码世界;2017年02期
2 张萌欣;;大数据金融产业进入共享合作新纪元——中国大数据金融产业创新战略联盟在贵阳成立[J];中国科技产业;2016年02期
3 张慧雯;刘文奇;李金海;;不完备形式背景下近似概念格的公理化方法[J];计算机科学;2015年06期
4 程陈;;大数据挖掘分析[J];软件;2014年04期
5 何清;庄福振;;基于云计算的大数据挖掘平台[J];中兴通讯技术;2013年04期
6 陈明;;大数据问题[J];计算机教育;2013年05期
7 李国杰;程学旗;;大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J];中国科学院院刊;2012年06期
8 陈如明;;大数据时代的挑战、价值与应对策略[J];移动通信;2012年17期
9 毕强;滕广青;;国外形式概念分析与概念格理论应用研究的前沿进展及热点分析[J];现代图书情报技术;2010年11期
10 智慧来;智东杰;刘宗田;;概念格合并原理与算法[J];电子学报;2010年02期
相关博士学位论文 前2条
1 智慧来;概念格构造与应用中的关键技术研究[D];上海大学;2010年
2 李云;概念格分布处理及其框架下的知识发现研究[D];上海大学;2005年
相关硕士学位论文 前1条
1 米允龙;大数据下粗糙关联规则算法研究[D];昆明理工大学;2014年
,本文编号:2447368
本文链接:https://www.wllwen.com/kejilunwen/jisuanjikexuelunwen/2447368.html