基于关系信息熵的特征选择
发布时间:2018-02-28 12:22
本文关键词: 信息熵 模糊集 粗糙集 邻域关系 模糊关系 出处:《渤海大学》2016年硕士论文 论文类型:学位论文
【摘要】:在信息爆炸的今天,网络信息技术快速发展,各种领域的大数据层出不穷、杂乱无章。对于大数据的处理已经成为数据挖掘方面的焦点问题。面对这样复杂的数据,很多都是不确定的或者是模糊的,这就需要人们从中获取用价值的信息。当遇到信息量特别大的数据时,需要采取适当的数据分析方法对其进行分类和知识约简。Rough集理论和Fuzzy集理论是用来处理不确定性的数学工具,它们可以用来处理数据的不确定性和数据的模糊性。近些年来,这些理论已经在数据挖掘、机器学习、模式识别等方面占据了不小的地位,成为很多学者的研究方向,并且还在被拓宽到多个领域,取得了很多实际成果。本文总体思想是:将Shannon熵的理论与粗糙集理论基础知识相结合,分别提出了邻域关系信息熵、模糊关系信息熵等概念,对它们的性质进行了详细讨论,并进行数据实验分析。具体工作如下:1.邻域是数据的分类与学习中最重要的概念之一,用来区分不同决策的样本。在本文中,提出了邻域关系熵,用来刻画一个邻域关系的不确定性,它反映了一个特征子集的区分能力。本文的邻域关系熵不同于以往的邻域熵,邻域关系熵是通过邻域关系的基数定义的,而不是通过计算邻域相似类的基数而得到的。为了描述由于特征子集的变化而引起的数据不确定信息的变化,提出了邻域关系联合熵、条件邻域关系熵、邻域关系互信息等概念。另外,在这些测度中引入参数,使得它们更利于分析实值数据。基于以上的不确定性度量,定义了刻画特征子集的属性重要度,并设计了特征选择贪心算法,最后利用UCI标准数据集进行实验分析,与现有算法进行比较。实验结果表明,基于邻域关系熵的特征选择算法优于其他一些经典算法。2.利用距离函数重新定义了模糊关系,提出模糊关系联合熵、条件模糊关系熵、模糊关系互信息等概念,并对其性质进行了讨论。另外,讨论了邻域半径和属性子集对模糊关系熵的影响。基于以上理论的研究与论证,设计了基于模糊关系熵的特征选择算法并进行实验验证分析。实验证明:与模糊信息熵相比,本文提出的算法不仅减少了属性约简的复杂度而且提高了样本的分类精度,同时在一定程度上也缩减了约简时间,具有一定的实际意义。
[Abstract]:In today's information explosion, with the rapid development of network information technology, big data, in various fields, has emerged in endlessly and disorderly. The processing of big data has become a focal point in data mining. Facing such complicated data, Many of them are uncertain or vague, which requires people to get valuable information from it. When it comes to data with a particularly large amount of information, Classification and knowledge reduction. Rough set theory and Fuzzy set theory are mathematical tools for dealing with uncertainty, which can be used to deal with data uncertainty and data fuzziness. These theories have occupied a large position in data mining, machine learning, pattern recognition and so on. They have become the research direction of many scholars, and have also been broadened to many fields. The general idea of this paper is to combine the theory of Shannon entropy with the basic knowledge of rough set theory, and put forward the concepts of neighborhood information entropy and fuzzy relation information entropy respectively, and discuss their properties in detail. The detailed work is as follows: 1. Neighborhood is one of the most important concepts in data classification and learning, which is used to distinguish different decision samples. In this paper, the neighborhood entropy is proposed. In this paper, the entropy of neighborhood relation is different from the former entropy of neighborhood, and the entropy of neighborhood relation is defined by the cardinality of neighborhood relation. In order to describe the change of uncertain information caused by the change of feature subset, the joint entropy of neighborhood relation and the entropy of conditional neighborhood relation are proposed. In addition, the parameters are introduced into these measures to facilitate the analysis of real value data. Based on the above uncertainty measures, the attribute importance of characterizing feature subsets is defined. The feature selection greedy algorithm is designed. Finally, the UCI standard data set is used for experimental analysis, which is compared with the existing algorithms. The experimental results show that, The feature selection algorithm based on neighborhood entropy is superior to other classical algorithms .2.Using distance function to redefine fuzzy relation, the concepts of joint entropy of fuzzy relation, conditional fuzzy relation entropy and mutual information of fuzzy relation are proposed. In addition, the influence of neighborhood radius and attribute subset on the entropy of fuzzy relation is discussed. A feature selection algorithm based on fuzzy relation entropy is designed and verified by experiments. Experimental results show that compared with fuzzy information entropy, the proposed algorithm not only reduces the complexity of attribute reduction, but also improves the classification accuracy of samples. At the same time, the reduction time is also reduced to a certain extent, which has certain practical significance.
【学位授予单位】:渤海大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:O236
,
本文编号:1547290
本文链接:https://www.wllwen.com/kejilunwen/yysx/1547290.html