代价敏感的三支决策边界域处理模型研究
本文选题:三支决策 + 边界域处理 ; 参考:《安徽大学》2017年硕士论文
【摘要】:三支决策理论(3WD)由三种决策规则组成,分别是接受决策、拒绝决策和不承诺决策。在传统的二支决策的基础上,三支决策增加了一个不承诺决策选项,即表示人们可以在信息不充分的情况下采取不承诺决策,又称为延迟决策。三支决策理论是姚在研究粗糙集和决策粗糙集理论的过程中提出的,为粗糙集理论的三个域提供了合理的语义解释:划分到正域中的对象表示做出接受决策;划分到负域中的对象表示做出拒绝决策;划分到边界域中的对象表示需要等待进一步观察才能做决策,即延迟决策。这种三支决策的决策模式与人类在解决实际问题时的决策模式很相似,目前已经在多个学科领域中得到广泛应用,例如医疗诊断、投资决策、垃圾邮件分类等等。基于决策粗糙集的三支决策模型是应用最广泛的三支决策模型,它在处理分类问题时具有一定的代价敏感性,而且可以直接通过损失函数计算阈值α和β,但是该模型没有对边界域做进一步的讨论。基于构造性覆盖算法的三支决策模型将构造性覆盖算法引入到三支决策理论中,为三支决策理论开辟了新的研究方向。它可以不需要讨论任何参数问题而自动形成三个域,而且该模型针对边界域的处理问题提出了三种处理原则,但是这三种原则在分类过程中都没有考虑到代价敏感性。近年来,随着数据挖掘技术和机器学习技术的不断发展,人们越来越多地意识到分类问题往往具有代价敏感性,如何有效地处理三支决策边界域也已经成为三支决策领域亟待解决的问题。因此,本文针对三支决策边界域的处理问题提出了两种代价敏感的分类模型,其目标就是为了对边界域进行处理的同时尽可能地降低分类损失和高代价样本误分类数。本文的主要工作包括:1、本文首先对三支决策理论的发展历程做了简单的梳理,并对该理论的研究现状以及存在的问题进行了分析和总结。然后详细介绍了两种经典三支决策模型的相关理论,即决策粗糙集模型和基于构造性覆盖算法的三支决策模型。最后针对三支决策边界域的处理问题提出了两种代价敏感的模型,分别是基于CCA的代价敏感三支决策边界域处理模型和基于K最近邻的代价敏感三支决策边界域处理模型,为三支决策边界域的处理问题提出了新的解决方案。2、基于CCA的代价敏感三支决策边界域处理模型(CPBM)将误分类损失函数的大小关系作为依据来调整样本与覆盖之间边界距离,以降低处理边界域样本的分类损失。而在基于CCA的三支决策模型中,距边界最近原则在处理样本时没有考虑分类的代价敏感性,只是根据与该样本边界距离最小的覆盖类别对样本进行划分。相比于非代价敏感的距边界最近原则,CPBM在处理边界域时可以有效提高高代价样本的召回率,最高可以达到20%,从而降低分类损失。3、基于K最近邻的代价敏感三支决策边界域处理模型(CTK)将K最近邻的思想与代价敏感的方法相结合,在处理边界域样本时将不同的决策损失进行量化,通过选择决策损失最小的决策来降低分类损失。根据求得的最优K值,该模型在处理边界域样本时可以充分的利用最近邻K个覆盖的类别信息来提高分类准确性。因此,与普通的非代价敏感方法相比,CTK在处理边界域时不仅可以有效降低分类损失,而且在某些数据集上其分类的错误率也相对较低。
[Abstract]:The three decision theory (3WD) is composed of three decision-making rules, which are the acceptance decision, the refusal decision and the non commitment decision. On the basis of the traditional two decision, the three decisions add a non commitment decision option, which means that people can take non commitment decision in the case of insufficient information, also called the delay decision. Three decisions are made. The theory is proposed by Yao in the process of studying rough sets and rough set theory. It provides a reasonable semantic interpretation for three domains of Rough Set Theory: the object representation in the positive domain is divided into the negative domain and the object representation in the negative domain makes a refusal decision; the object representation in the boundary domain needs to be further discussed. Observation can be made to make decision, that is, delay decision. The decision model of this three decision is very similar to the decision model of human being in solving practical problems. At present, it has been widely used in many disciplines, such as medical diagnosis, investment decision, spam classification and so on. The three decision models based on rough set are the most widely used. A generalized three decision model, which has a certain cost sensitivity in dealing with the classification problem, and can calculate the threshold alpha and beta directly through the loss function, but the model does not further discuss the boundary domain. The construction coverage algorithm is introduced into the three decision theory based on the three decision model based on the structural coverage algorithm. It opens up a new research direction for the three decision theory. It can automatically form three domains without the need to discuss any parameter problems, and the model puts forward three processing principles for the processing of the boundary domain, but these three principles do not take account of the generation price sensitivity in the classification process. In recent years, with the data mining technology and the data mining technology, With the continuous development of machine learning technology, people are increasingly aware that classification problems often have cost sensitivity. How to effectively deal with the three decision boundary areas has also become a problem to be solved urgently in the three decision-making fields. Therefore, this paper puts forward two cost sensitive classification models for the problems in the three decision-making boundary areas. The aim of this paper is to reduce the classification loss and the high cost sample misclassification as far as possible. The main work of this paper is as follows: 1. First, the development process of the three decision-making theories is briefly combed, and the research status and existing problems of the theory are analyzed and summarized. The related theories of two classical three decision model are introduced in detail, namely, the decision rough set model and the three decision model based on the constructional coverage algorithm. Finally, two cost sensitive models are proposed for the processing problem of the three decision boundary domains, which are based on the cost sensitive three decision boundary domain processing model based on the CCA and based on the K. The nearest neighbor's cost is sensitive to three decision boundary domain processing models, and a new solution.2 is proposed for the processing problem of three decision boundary domains. The cost sensitive three decision boundary domain processing model based on the cost of CCA (CPBM) takes the size relation of the misclassified loss function as the basis to reduce the boundary distance between the sample and the cover, in order to reduce the processing. In the three decision models based on CCA, the nearest principle of the distance to the boundary does not consider the cost sensitivity of the classification when dealing with the sample, but only according to the cover category which is the smallest distance from the sample boundary. Compared with the non cost sensitive nearest neighbor principle, the CPBM is dealing with the boundary domain. The recall rate of high cost samples can be improved effectively, the maximum can be reached to 20%, which reduces the classification loss.3. Based on the cost sensitive three decision boundary domain processing model (CTK) based on the K nearest neighbor's cost sensitive method, the thought of the nearest neighbor of the K is combined with the cost sensitive method, and the different decision losses are quantified in the process of processing the boundary area, by selecting the different decision losses. According to the optimal K value obtained, the model can make full use of the nearest neighbor K coverage information to improve the classification accuracy according to the obtained optimal value. Therefore, compared with ordinary non cost sensitive methods, CTK can not only effectively reduce the classification loss when it is in the boundary domain. Moreover, the error rate of some data sets is relatively low.
【学位授予单位】:安徽大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18;O225
【相似文献】
相关期刊论文 前8条
1 郑德高;;定势下多思绪模式序列处理模型[J];科学;1993年01期
2 蔡天舸,王命延,肖美华,钱钟胜;一种基于2PC过程的通用分布式事务处理模型[J];南昌大学学报(理科版);2001年01期
3 帅向华;胡素平;刘钦;甄盟;;地震灾情网络媒体获取与处理模型[J];自然灾害学报;2013年03期
4 闫利;姜芸;王军;;利用视线向量的资源三号卫星影像严格几何处理模型[J];武汉大学学报(信息科学版);2013年12期
5 薛小峰,汪晓程;WebGISR的优化处理模型[J];微电子学与计算机;2000年04期
6 高琳琦,李怀祖,孙林岩;基于事件触发机制的EDI事务处理模型[J];系统工程理论方法应用;1998年03期
7 张洪水;贾小珠;纪美霞;;一种改进的基于意图识别技术的报警信息关联处理模型[J];青岛大学学报(自然科学版);2007年04期
8 ;[J];;年期
相关会议论文 前2条
1 王文涛;聂祚仁;龚先政;;LCI中废钢的处理模型分析[A];2004年中国材料研讨会论文摘要集[C];2004年
2 张英俊;谢斌红;陈立潮;;基于Agent的Web服务事务处理模型研究[A];2006年全国开放式分布与并行计算机学术会议论文集(三)[C];2006年
相关重要报纸文章 前1条
1 王晓民;IBM剑指客户端技术颠峰[N];电脑商报;2004年
相关硕士学位论文 前10条
1 董正云;Bondarenko方法在共振计算中的改进与适用性研究[D];华北电力大学(北京);2016年
2 王刚;代价敏感的三支决策边界域处理模型研究[D];安徽大学;2017年
3 田冲;基于偏好学习的组合服务事务处理模型研究[D];合肥工业大学;2009年
4 李文生;移动数据库事务处理模型的研究[D];重庆大学;2013年
5 徐碧云;Web服务事务处理模型研究与实现[D];河海大学;2005年
6 唐旭;大量并发环境下的缓冲异步处理模型研究与应用[D];重庆大学;2008年
7 何演;Web服务事务处理模型研究与设计[D];南京邮电大学;2011年
8 柏延松;基于场景的移动任务分析与处理模型的设计[D];西北大学;2013年
9 杨石;嵌入式信息系统中移动事务处理模型的研究[D];长春理工大学;2012年
10 吕书哲;移动事务处理模型的研究与实现[D];华南理工大学;2014年
,本文编号:2079378
本文链接:https://www.wllwen.com/kejilunwen/yysx/2079378.html