基于决策粗糙集模型的多类代价敏感学习研究
本文关键词:基于决策粗糙集模型的多类代价敏感学习研究 出处:《南京理工大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 决策粗糙集 三支决策 代价敏感学习 多类问题 多阶段分类 文本分类
【摘要】:近年来,随着计算机和互联网技术的发展与普及,产生的海量数据信息除了在规模上、复杂性上和过去的信息有很大不同之外,还存在着一定的不确定性和模糊性。决策粗糙集模型作为一种处理不精确、不确定性问题的方法,通过引入贝叶斯决策过程,给出了根据损失函数矩阵计算决策阈值的系统化方法,进而得到基于粗糙集三个区域的三支决策框架,很好的解决了用户在信息不足时如何做出合理决策的问题。现有的很多工作都是基于经典二类决策粗糙集模型展开的,对于多分类问题,大多是将其转化为多个二分类问题进而用二类决策粗糙集分类方法进行处理,这既要求用户提供更多的损失函数,又降低了计算的效率。鉴于此,本文将决策粗糙集与代价敏感学习相结合,提出了一种多类决策粗糙集模型,并以此模型为基础对代价敏感学习进行了研究,主要包括如下的研究内容:第一,决策粗糙集在分类模型上的扩展。损失函数矩阵在决策粗糙集中具有重要作用,在二分类问题中,可以很容易的根据损失函数矩阵计算出决策阈值。本文从语义角度出发,以损失函数为研究对象,将决策粗糙集与代价敏感学习相结合,利用经典代价敏感学习提供的多类问题代价矩阵推导出多类情况下的损失函数值,进而提出多类决策粗糙集模型,并基于此模型设计代价敏感三支决策分类算法,通过对比实验结果分析说明了所提算法在处理多类代价敏感问题中的有效性。第二,基于多类决策粗糙集模型的多阶段代价敏感学习方法。决策粗糙集的分类方法输出的是三支决策结果,只有划分到正域的对象才能以高置信度确定其类标,而划分到边界域的对象因信息不足而做出延迟决策,划分到负域中的对象因置信度不高而做出拒绝决策,即这两个区域中的对象在三支决策结果中都未给定具体类标。针对此问题,本文提出了基于多类决策粗糙集模型的多阶段代价敏感学习方法,通过多个阶段的分类过程以最终消除边界域和负域,将三支决策分类转化为二支决策分类。实验结果表明了所提算法具有较好的分类性能。第三,基于多类决策粗糙集模型的文本分类。文本分类是近年来的研究热点,本文以搜狗中文文本为语料库,使用基于多类决策粗糙集模型的多阶段代价敏感学习算法训练文本分类器,并与几种常用的机器学习分类算法进行实验对比,结果表明本文的方法在文本分类中具有更高的分类精确率和召回率以及更低的分类代价,进一步突出本文所提算法的代价敏感性并扩展了决策粗糙集在实际问题中的应用。
[Abstract]:In recent years, with the development of computer and Internet technology and the popularity of the massive data except in size, complexity and past information are very different, there are still some uncertainty and fuzziness. The decision model of rough sets as a method to deal with imprecise, uncertain problems, through the introduction of Bayesian decision process, gives a systematic method to calculate the decision threshold according to the loss function of the matrix, and then obtain the rough set three decision framework of three regions based on a good solution to the user how to make a reasonable decision in the information shortage problem. Many existing works are classic two kinds of rough set model based on decision making for the multi classification problem, is transformed into a plurality of two classification problems and two kinds of decision rough set classification method for processing, it is required to provide users with more The loss of function, and reduces the computing efficiency. In view of this, this paper will be combined with rough set decision and cost sensitive learning, proposes a multi class decision rough set model, and based on the model of cost sensitive learning, including the main research contents are as follows: first, extended rough set decision in the classification model. The loss function matrix in rough set decision-making plays an important role in the two classification problems, can be easily calculated according to the loss function matrix decision threshold. In this paper, from the semantic angle, on the loss function as the research object, combined with rough set decision and cost sensitive learning, using sensitive learning the price of the classic multi class problem cost matrix to derive the loss function of the multi class case, and then put forward the multi class decision rough set model, and based on this model design cost sensitive three decision points Algorithms, by comparing the experimental results demonstrate that the proposed algorithm is effective in dealing with the multi class cost sensitive problems. Second, multi stage cost sensitive learning method for multi class decision based on rough sets model. The output classification method of rough set decision is the three decision results, only to the positive region of the object can be divided with high confidence to determine their standard, and divided into object boundary region due to the lack of information and make a decision to delay, division of objects in the domain of the negative because of confidence is not high but refused to make decisions, namely the two areas of the objects in the result are not given specific classes in three. According to the standard decision this problem, this paper proposes a multi stage cost sensitive learning method for multi class decision based on rough sets model, through the classification process of multiple stages in order to eliminate the boundary region and negative region will eventually support three decision classification into two branch classification. The experiment results show that the proposed algorithm has better classification performance. Third, multi class text classification decision based on rough sets model. Text classification is a hot research topic in recent years, with the Chinese Sogou text corpus, the use of multi class decision based on rough sets model generation cost sensitive learning algorithm to train classifier, and and several commonly used machine learning classification algorithm by experiments, results show that the classification cost of this method has higher classification in text classification precision rate and recall rate and lower, further highlight the proposed cost sensitive algorithm and extended the application of rough set decision in the actual problem.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18
【参考文献】
相关期刊论文 前10条
1 于洪;王国胤;姚一豫;;决策粗糙集理论研究现状与展望[J];计算机学报;2015年08期
2 程学旗;靳小龙;王元卓;郭嘉丰;张铁赢;李国杰;;大数据系统和分析技术综述[J];软件学报;2014年09期
3 郭敏;贾修一;商琳;;基于模糊化的决策粗糙集属性约简和分类[J];模式识别与人工智能;2014年08期
4 李华雄;周献中;黄兵;赵佳宝;;决策粗糙集与代价敏感分类[J];计算机科学与探索;2013年02期
5 安健;桂小林;张文东;蒋精华;张进;;物联网移动感知中的社会关系认知模型[J];计算机学报;2012年06期
6 钱进;苗夺谦;张泽华;;云计算环境下知识约简算法[J];计算机学报;2011年12期
7 贾修一;李伟nK;商琳;陈家骏;;一种自适应求三枝决策中决策阈值的算法[J];电子学报;2011年11期
8 贾修一;商琳;陈家骏;;决策风险最小化属性约简[J];计算机科学与探索;2011年02期
9 陈媛媛;张记龙;李晓;田二明;王志斌;刘智超;;基于粗糙集核优化的支持向量机在多组分污染气体定量分析中的研究与应用[J];光谱学与光谱分析;2010年12期
10 赵文清;朱永利;高伟华;;一个基于决策粗糙集理论的信息过滤模型[J];计算机工程与应用;2007年07期
,本文编号:1378726
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1378726.html