基于KDB模型的无约束贝叶斯分类器的研究与应用
发布时间:2018-05-05 18:43
本文选题:数据挖掘 + 无约束贝叶斯分类 ; 参考:《吉林大学》2017年硕士论文
【摘要】:近年来,随着信息时代的迅速发展产生了海量的数据,人们在处理数据时不能判断哪些数据起决策作用,如何能从在大量的数据中发现知识、获取有用信息是数据挖掘技术需要解决的问题。数据挖掘中的分类问题是研究人员越来越关注的学习方向之一,对分类问题的研究将有助于根据潜在信息做出关键决策。贝叶斯网络以贝叶斯理论和图论为理论基础,被视为知识发现、人工智能和数据挖掘等领域最有前景的研究方法之一。贝叶斯分类模型是贝叶斯网络的一个分支,分为约束型和无约束型两种模型。朴素贝叶斯是最经典的约束型贝叶斯分类模型,是构成其它约束型贝叶斯分类模型的基本框架,简单快速并且高效,但是它的条件独立性假设在实践中通常并不成立。KDB分类器在朴素贝叶斯的基础上进一步放松属性间的独立假设,可以构造任意k阶结构复杂度的网络结构,同时还能保持朴素贝叶斯的计算效率。然而,虽然KDB已经表现出显著的分类性能,但是,其受约束的网络结构使得它不可能表达对应于最优结构的类变量的马尔可夫毯,并且测试实例中蕴含的有效信息没有被充分利用,最终的决策可能是有偏差的,造成分类精度下降。针对以上问题,本文基于类变量的马尔可夫毯分析,结合局部学习提出了无约束型k-依赖分类器UKDB。从结构复杂度、分类效果和计算效率等方面,主要作了如下研究:1.UKDB可以表达属性依赖系列的任意k阶结构复杂度,并且可以输出两种子分类器,即描述隐含在训练集中因果关系的全局模型和描述隐含在测试集中因果关系的局部模型。局部模型可以看成是全局模型的补充部分。2.在UCI机器学习数据库的50个数据集的实验结果表明,UKDB在0-1损失、偏差和方差方面的综合性能优于KDB,并且只需要相对较小的计算复杂度。3.另外,将UKDB模型应用在医疗诊断上具有很大意义,在Wisconsin乳腺癌数据集的实验结果表明,UKDB相比KDB而言,分类误差显著降低了54.1%。总体而言,相对于KDB,UKDB更充分体现了结构复杂度和性能之间的权衡。
[Abstract]:In recent years, with the rapid development of the information age, people can not judge which data play a decision role, how to find knowledge from a large number of data. Obtaining useful information is a problem that needs to be solved in data mining technology. The classification problem in data mining is one of the learning directions that researchers pay more and more attention to. The research on classification problem will be helpful to make the key decision based on the potential information. Based on Bayesian theory and graph theory, Bayesian network is regarded as one of the most promising research methods in the fields of knowledge discovery, artificial intelligence and data mining. Bayesian classification model is a branch of Bayesian network, which can be divided into two types: constrained model and unconstrained model. Naive Bayes is the most classical constrained Bayesian classification model, which is the basic framework of other constrained Bayesian classification models. It is simple, fast and efficient. However, its conditional independence hypothesis usually does not hold in practice. On the basis of naive Bayes, the KDB classifier further relaxes the independent assumption among attributes, and can construct a network structure with arbitrary k-order structural complexity. At the same time, the computational efficiency of naive Bayes can be maintained. However, although KDB has shown significant classification performance, its constrained network structure makes it impossible for it to express Markov blankets corresponding to the class variables of the optimal structure. And the effective information contained in the test example is not fully utilized, the final decision may be biased, resulting in a decline in classification accuracy. In order to solve the above problems, based on Markov blanket analysis of class variables and local learning, an unconstrained k- dependent classifier UKDBs is proposed. From the aspects of structure complexity, classification effect and computational efficiency, this paper mainly studies as follows: 1. UKDB can express any k-order structural complexity of attribute-dependent series, and can output two seed classifiers. That is to describe the global model of causality implied in the training set and the local model of the causality implied in the test set. The local model can be regarded as a supplement to the global model. The experimental results of 50 data sets in UCI machine learning database show that the comprehensive performance of UCI is superior to that of KDBs in 0-1 loss, deviation and variance, and only requires a relatively small computational complexity of .3. In addition, the application of UKDB model in medical diagnosis has great significance. The experimental results of Wisconsin breast cancer data set show that the classification error is significantly reduced by 54.1% compared with KDB. In general, the trade-off between structural complexity and performance is better than KDB-UKDB.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前1条
1 张剑飞;刘克会;杜晓昕;;基于k阶依赖扩展的贝叶斯网络分类器集成学习算法[J];东北师大学报(自然科学版);2016年01期
相关硕士学位论文 前5条
1 李冬梅;朴素贝叶斯与决策树混合分类方法的研究[D];大连海事大学;2016年
2 阿曼;朴素贝叶斯分类算法的研究与应用[D];大连理工大学;2014年
3 孙文静;基于依赖分析和假设检验的贝叶斯分类器[D];西安电子科技大学;2014年
4 孙秀亮;基于属性加权的选择性朴素贝叶斯分类研究[D];哈尔滨工程大学;2013年
5 王国才;朴素贝叶斯分类器的研究与应用[D];重庆交通大学;2010年
,本文编号:1848860
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1848860.html