Hadoop环境下近似概念格的并行构造算法研究

发布时间：2019-03-26 08:40

【摘要】：随着信息技术的迅速发展,全球的数据量正在爆炸性地增长。大数据是指用新处理模式才能挖掘有价值信息的大型而复杂的数据集。大数据处理通常会用到分布式计算。分布式计算分割复杂问题成很小的部分,并分配给多台计算机进行处理,综合计算结果后得到最终结果。分布式计算能极大地缩减程序运行时间。概念格是一个能够高效地分析数据并获取知识的工具,已被用于众多领域,例如机器学习,信息检索和专家系统等。概念格能直观地显示对象和属性之间的关系。现实中,信息系统常带有缺失值,包含缺值信息的形式背景称为不完备形式背景,在此基础上建立的概念格模型称为近似概念格。面对海量数据,传统串行的近似概念格构造算法效率较低。为了解决此问题,通过深度分析近似概念格及不完备信息系统的特征,提出了基于Hadoop环境的MapReduce框架的两种近似概念格的并行构造算法,即并行合并算法和并行增量算法。具体如下:(1)并行合并算法:在MapReduce框架中,首先生成两个概念格,然后把两个概念格进行合并。试验采用LD2011__2014数据集作为实验数据,实验结果表明,该并行算法是可行且高效的。(2)并行增量算法:在经典增量算法基础上提出了并行算法,算法可以直接生成近似概念格,不需要合并。试验采用LD2011__2014数据集作为实验数据,实验结果显示,此算法是可行且高效的。
[Abstract]:With the rapid development of information technology, the amount of data around the world is increasing explosively. Big data is a large and complex data set that uses new processing patterns to mine valuable information. Big data usually uses distributed computing for processing. The complex problem of distributed computing is divided into a small part, which is assigned to several computers for processing, and the final result is obtained by synthesizing the calculated results. Distributed computing can greatly reduce program running time. Concept lattice is a tool which can efficiently analyze data and obtain knowledge. It has been used in many fields, such as machine learning, information retrieval and expert system. Concept lattices can visualize the relationship between objects and attributes. In reality, information systems often have missing values, and the formal background containing missing information is called incomplete formal background. On this basis, the concept lattice model established on this basis is called approximate concept lattice. In the face of massive data, the traditional serial approximate concept lattice construction algorithm is less efficient. In order to solve this problem, by analyzing the characteristics of approximate concept lattices and incomplete information systems in depth, two parallel construction algorithms of approximate concept lattices based on MapReduce framework in Hadoop environment, namely parallel merging algorithm and parallel incremental algorithm, are proposed. The main contents are as follows: (1) parallel merging algorithm: in the framework of MapReduce, two concept lattices are first generated, and then two concept lattices are merged. The experimental results show that the parallel algorithm is feasible and efficient. (2) parallel incremental algorithm: based on the classical incremental algorithm, a parallel algorithm is proposed. The algorithm can generate approximate concept lattice directly without merging. The LD2011__2014 data set is used as the experimental data. The experimental results show that the algorithm is feasible and efficient.
【学位授予单位】：昆明理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.13;TP338.6

【参考文献】