基于连续属性离散化的属性偏序理论的知识发现

发布时间：2018-04-30 23:22

本文选题：属性偏序理论 + 连续属性　；参考：《燕山大学》2016年硕士论文

【摘要】：计算机技术的迅猛发展和数据库系统的普遍使用,给人们提供了强有力的平台,去更好地利用信息技术进行生产,而且搜集和检索数据的能力也有显著提高。丰富多样的信息感知和采集设备引领我们步入大数据时代。然而,采用什么样的手段和技术从海量数据中发现对决策非常有价值的规则和知识,成为前沿科技亟需解决的关键问题。数据挖掘与机器学习作为处理数据的重要手段,已然成为当今研究的一个热点问题。但是,许多知识发现和数据挖掘的算法得以进行的先决条件就是所要处理的属性值是离散值,而在现实生活中得到的数据常常是连续属性,所以必须对连续属性进行离散化处理。本文通过以UCI数据集的知识发现和规则提取为基础,通过主流连续属性离散化方法的对数据集中的多维属性值进行离散化处理,并且以形式概念分析理论的形式背景的可视化表达为重要手段,运用离散化处理后的数据结果构建二值形式背景,以形式背景分层优化和属性偏序结构图生成方法为关键环节,生成不同数据集的属性偏序结构图,进行知识的规则提取,通过与数据集的分布特点及类标签来比较,对离散化方案进行评估。主要内容如下:1)建立了基于形式背景的知识表达原理的混合数据集成框架,实现各种数据在一个框架下可视化表示和基于属性偏序结构理论实现的数据挖掘。2)研究了基于优化形式背景生成偏序结构原理的知识发现数学方法,生成了基于属性覆盖对象(或者对象覆盖属性)原理的数据偏序结构图。3)针对现有离散化算法在高维数据处理中的局限性,研究了一种基于非线性降维技术的高维数据离散化方法-改进局部线性嵌入算法,实现基于连续属性离散化的属性偏序理论的知识发现。本文通过对UCI数据集的降维及离散化处理后,有着更高精度的知识,简化了复杂的知识规则,对大数据的知识规则提取及可视化有重要意义。
[Abstract]:With the rapid development of computer technology and the widespread use of database systems, people have been provided with a powerful platform to make better use of information technology for production, and the ability to collect and retrieve data has also been significantly improved. Rich and diverse information perception and acquisition equipment leading us into the era of big data. However, how to find the rules and knowledge that are very valuable to decision making from the massive data has become the key problem that needs to be solved in the frontier science and technology. As an important means of data processing, data mining and machine learning have become a hot issue. However, a prerequisite for many knowledge discovery and data mining algorithms is that the attribute values to be processed are discrete values, while in real life the data is often continuous. Therefore, continuous attributes must be discretized. Based on the knowledge discovery and rule extraction of UCI data sets, this paper discretizes the multidimensional attribute values of the data set by the mainstream continuous attribute discretization method. With the visual representation of formal background of formal conceptual analysis theory as an important means, the binary formal background is constructed by using the results of discrete data processing, and the key link is the hierarchical optimization of formal background and the method of generating attribute partial order structure diagram. The attribute partial order structure diagram of different data sets is generated and the rules of knowledge are extracted. The discretization scheme is evaluated by comparing it with the distribution characteristics and class labels of the data sets. The main contents are as follows: 1) A hybrid data integration framework based on formal background knowledge representation is established. Based on the theory of attribute partial order structure, the mathematical method of knowledge discovery based on the principle of optimizing formal background to generate partial order structure is studied, which realizes the visual representation of all kinds of data under a framework and data mining based on attribute partial order structure theory. The data partial order structure diagram based on the principle of attribute overlay object (or object overlay property) is generated. 3) in view of the limitations of existing discretization algorithms in high-dimensional data processing, An improved local linear embedding algorithm for discretization of high-dimensional data based on nonlinear dimensionality reduction technique is proposed to realize the knowledge discovery of attribute partial order theory based on continuous attribute discretization. After dimensionality reduction and discretization of UCI data sets, this paper has higher precision knowledge, simplifies complex knowledge rules, and is of great significance to big data's knowledge rule extraction and visualization.
【学位授予单位】：燕山大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP311.13;O153.1

【相似文献】