基于FCA的概念学习研究
发布时间:2019-06-29 15:03
【摘要】:随着信息技术的不断发展,人们获取数据的方式不再单一,如电视、报纸、互联网等,获取数据的周期也在不断减小。面对海量的结构化、非结构化、半结构化的数据,如何快速有效的从中挖掘出潜在的价值是当前的研究热点,同时也是机器学习面临的挑战和机遇。粗糙集理论是Pawlak于1982年提出,并广泛应用于数据挖掘、机器学习、决策分析等领域。形式概念分析(FCA)是由R.Wille于1982年提出的一种知识发现的数学工具,它被广泛应用于数据挖掘,聚类,分类等领域。粗糙集主要应用于不确定性的知识表达,FCA主要是挖掘同类事物与事物所具有的属性之间的内在关系,粗糙集和FCA结合可以挖掘事物与其所具有的属性之间的不确定性关系。在多途径获取信息的时代,从单一的信息源获取的信息往往是模糊或不完备的,因此将多个信息源获取的信息融合起来就显得非常有必要了,融合的目的是将多个信源的信息融合为一个信息量更大的信息体,使得融合有“1+12”的效果。本文正是基于FCA的主要特征概念学习和多源模糊概念学习,研究了主要特征和信息融合的数学性质,同时设计了融合算法和模糊概念学习算法,并通过数值实验验证本文所提方法的有效性,本文的主要创新点如下:1.在形式背景中定义a主要特征,并用数学语言阐述了必然特征与似然特征的差别,研究形式背景中基础比率对认知的影响,结合基础比率和a主要特征定义特征与概念之间的可信度,研究了可信度是如何表达拥有某一属性(特征)的对象在属于概念的可信程度,并讨论了可信度与主要特征之间的关系。最后通过案例的计算过程来展示可信度在度量对象的归属问题上比单一的主要特征好,并验证了模型的有效性和可行性。2.条件熵是信息的一种度量方式,基于条件熵的信息融合可以在减少冗余信息的同时提高分类的质量,使得相似类更细,即同一类别的对象之间的差别较小,不同类别之间的差别较大。在融合的基础之上进行模糊概念学习,并根据提出的条件熵融合方法设计了对应的条件熵融合算法,在此基础上设计了基于对象信息和属性信息的两种模糊概念学习算法,然后使用UCI数据集设计了一系列的数值实验,将本所提出的条件熵融合与传统的均值融合进行对比,验证条件熵融合在减少冗余信息的同时提高了分类的质量,而均值融合仅仅是运用统计方法将多源的信息进行均值压缩,不能提高分类的质量。
[Abstract]:With the continuous development of information technology, the way for people to obtain data is no longer single, such as television, newspapers, the Internet and so on, and the cycle of obtaining data is also decreasing. In the face of massive structured, unstructured and semi-structured data, how to dig out the potential value quickly and effectively is not only the current research focus, but also the challenge and opportunity of machine learning. Rough set theory was put forward by Pawlak in 1982 and is widely used in data mining, machine learning, decision analysis and other fields. Formal concept analysis (FCA) is a mathematical tool of knowledge discovery proposed by R.Wille in 1982. It is widely used in data mining, clustering, classification and other fields. Rough set is mainly used in the knowledge representation of uncertainty. FCA is mainly to mine the internal relationship between the same kind of things and the attributes of the same kind of things. The combination of rough sets and FCA can mine the uncertain relationship between things and their attributes. In the era of multi-channel access to information, the information obtained from a single information source is often fuzzy or incomplete, so it is very necessary to fuse the information obtained from multiple information sources. The purpose of fusion is to fuse the information from multiple sources into a more informative information body, so that the fusion has the effect of "112". In this paper, based on the main feature concept learning and multi-source fuzzy concept learning of FCA, the mathematical properties of main features and information fusion are studied, and the fusion algorithm and fuzzy concept learning algorithm are designed, and the effectiveness of the proposed method is verified by numerical experiments. The main innovations of this paper are as follows: 1. This paper defines the main features of a in the formal background, and expounds the difference between the inevitable features and the likelihood features in mathematical language, studies the influence of the basic ratio on cognition in the formal background, and combines the basic ratio and the credibility between the main features of a definition of features and concepts, and studies how the credibility of the object with a certain attribute (feature) belongs to the concept. The relationship between credibility and main characteristics is also discussed. Finally, the calculation process of the case is used to show that the credibility is better than the single main feature in the problem of measuring the ownership of the object, and the effectiveness and feasibility of the model are verified. 2. Conditional entropy is a measure of information. Information fusion based on conditional entropy can reduce redundant information and improve the quality of classification, which makes similar classes finer, that is, the difference between objects of the same category is small, and the difference between different categories is large. On the basis of fusion, fuzzy concept learning is carried out, and the corresponding conditional entropy fusion algorithm is designed according to the proposed conditional entropy fusion method. On this basis, two fuzzy concept learning algorithms based on object information and attribute information are designed. Then a series of numerical experiments are designed using UCI dataset to compare the proposed conditional entropy fusion with the traditional mean fusion. It is verified that conditional entropy fusion can reduce redundant information and improve the quality of classification, while mean fusion only uses statistical method to compress multi-source information, which can not improve the quality of classification.
【学位授予单位】:重庆理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
本文编号:2507909
[Abstract]:With the continuous development of information technology, the way for people to obtain data is no longer single, such as television, newspapers, the Internet and so on, and the cycle of obtaining data is also decreasing. In the face of massive structured, unstructured and semi-structured data, how to dig out the potential value quickly and effectively is not only the current research focus, but also the challenge and opportunity of machine learning. Rough set theory was put forward by Pawlak in 1982 and is widely used in data mining, machine learning, decision analysis and other fields. Formal concept analysis (FCA) is a mathematical tool of knowledge discovery proposed by R.Wille in 1982. It is widely used in data mining, clustering, classification and other fields. Rough set is mainly used in the knowledge representation of uncertainty. FCA is mainly to mine the internal relationship between the same kind of things and the attributes of the same kind of things. The combination of rough sets and FCA can mine the uncertain relationship between things and their attributes. In the era of multi-channel access to information, the information obtained from a single information source is often fuzzy or incomplete, so it is very necessary to fuse the information obtained from multiple information sources. The purpose of fusion is to fuse the information from multiple sources into a more informative information body, so that the fusion has the effect of "112". In this paper, based on the main feature concept learning and multi-source fuzzy concept learning of FCA, the mathematical properties of main features and information fusion are studied, and the fusion algorithm and fuzzy concept learning algorithm are designed, and the effectiveness of the proposed method is verified by numerical experiments. The main innovations of this paper are as follows: 1. This paper defines the main features of a in the formal background, and expounds the difference between the inevitable features and the likelihood features in mathematical language, studies the influence of the basic ratio on cognition in the formal background, and combines the basic ratio and the credibility between the main features of a definition of features and concepts, and studies how the credibility of the object with a certain attribute (feature) belongs to the concept. The relationship between credibility and main characteristics is also discussed. Finally, the calculation process of the case is used to show that the credibility is better than the single main feature in the problem of measuring the ownership of the object, and the effectiveness and feasibility of the model are verified. 2. Conditional entropy is a measure of information. Information fusion based on conditional entropy can reduce redundant information and improve the quality of classification, which makes similar classes finer, that is, the difference between objects of the same category is small, and the difference between different categories is large. On the basis of fusion, fuzzy concept learning is carried out, and the corresponding conditional entropy fusion algorithm is designed according to the proposed conditional entropy fusion method. On this basis, two fuzzy concept learning algorithms based on object information and attribute information are designed. Then a series of numerical experiments are designed using UCI dataset to compare the proposed conditional entropy fusion with the traditional mean fusion. It is verified that conditional entropy fusion can reduce redundant information and improve the quality of classification, while mean fusion only uses statistical method to compress multi-source information, which can not improve the quality of classification.
【学位授予单位】:重庆理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前7条
1 智慧来;;面向异构数据分析的形式概念分析扩展模型[J];电子学报;2013年12期
2 张斌;刘增良;余达太;黄洪;;基于形式概念分析与统计理论的本体构建模型[J];计算机应用研究;2011年01期
3 许佳卿;彭鑫;赵文耘;;一种基于模糊形式概念分析的程序聚类方法[J];计算机研究与发展;2009年09期
4 蒋平;任胜兵;林鹃;;形式概念分析在软件工程中的应用[J];计算机技术与发展;2008年04期
5 张文修;徐伟华;;基于粒计算的认知模型[J];工程数学学报;2007年06期
6 宋笑雪;张文修;;形式概念分析与集值信息系统[J];计算机科学;2007年11期
7 王国胤;Rough集理论在不完备信息系统中的扩充[J];计算机研究与发展;2002年10期
,本文编号:2507909
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2507909.html