当前位置:主页 > 科技论文 > 自动化论文 >

迁移学习框架下不平衡分类问题研究

发布时间:2018-11-09 10:19
【摘要】:迁移学习是机器学习领域中新兴的框架,放宽了传统机器学习的两个基本假设,近年来受到了广泛关注。现有的关于迁移学习框架下不平衡分类问题的相关工作,主要集中于单源迁移研究方面,存在的潜在问题是可迁移信息较少,甚至可能会产生“负迁移”。针对已有迁移学习框架下不平衡分类问题相关研究存在的不足,本文通过引入多源迁移机制,展开了基于多源的迁移学习非均衡分类研究。首先,针对目标领域和源领域数据分布相似且正负样本不平衡的二分类迁移学习问题,论文提出一种基于多源数据的集成迁移学习非均衡样本分类算法MSTUSC。该方法引入多个源领域数据以避免“负迁移”,采用新的样本初始权重和样本权重更新策略来解决不均衡样本分类问题,并采用冗余样本淘汰机制,适时淘汰多源域中冗余数据,有效降低算法的时空开销。在UCI标准数据上进行实验,采用F1值和AUC值作为评价指标。实验结果表明,本文所提的MSTUSC算法在不平衡数据上的分类性能优于其它几种对比迁移算法。其次,为了改善MSTUSC算法的时间效率,还提出了面向分布式的多源数据的集成迁移学习非均衡样本分类算法DMSTUSC。引入分布式系统,将每个源领域划分到分布式系统的一个节点上,在单个节点上进行单源非均衡样本分类的集成迁移学习算法训练,得到分类模型,最终将每个节点训练得到的分类模型进行集成,得到多源数据的集成迁移学习非均衡样本分类算法。通过实验分析可知,同MSTUSC算法相比,DMSTUSC算法的时间效率明显提高。
[Abstract]:Transfer learning is a new framework in the field of machine learning. It has relaxed the two basic assumptions of traditional machine learning and has received extensive attention in recent years. The existing work on unbalanced classification in the framework of transfer learning is mainly focused on the study of single source migration. The potential problem is that there is less transferable information and even "negative migration" may occur. In view of the shortcomings of the existing researches on unbalanced classification under the framework of transfer learning, this paper introduces the mechanism of multi-source migration, and develops the research of non-equilibrium classification of transfer learning based on multi-source. First of all, aiming at the two-classification migration learning problem with similar data distribution in target domain and source domain and imbalance of positive and negative samples, this paper proposes an integrated migration learning disequilibrium sample classification algorithm MSTUSC. based on multi-source data. In this method, multiple source domain data are introduced to avoid "negative migration", new initial weight and weight updating strategies are adopted to solve the problem of uneven sample classification, and redundant sample elimination mechanism is adopted. Timely elimination of redundant data in multi-source domain can effectively reduce the space-time overhead of the algorithm. Based on the UCI standard data, F1 value and AUC value were used as the evaluation index. The experimental results show that the classification performance of the proposed MSTUSC algorithm on unbalanced data is better than that of other contrastive migration algorithms. Secondly, in order to improve the time efficiency of MSTUSC algorithm, a distributed multi-source data integration migration learning disequilibrium sample classification algorithm DMSTUSC. is proposed. The distributed system is introduced, each source domain is divided into one node of the distributed system, and the integrated migration learning algorithm of single source disequilibrium sample classification is trained on a single node, and the classification model is obtained. Finally, the classification model trained by each node is integrated, and an ensemble migration learning disequilibrium sample classification algorithm for multi-source data is obtained. The experimental results show that compared with the MSTUSC algorithm, the time efficiency of the DMSTUSC algorithm is obviously improved.
【学位授予单位】:安徽工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP181

【参考文献】

相关期刊论文 前7条

1 尹华;胡玉平;;一种代价敏感随机森林算法[J];武汉大学学报(工学版);2014年05期

2 顾鑫;王士同;;大样本多源域与小目标域的跨领域快速分类学习[J];计算机研究与发展;2014年03期

3 张倩;李明;王雪松;程玉虎;朱美强;;一种面向多源领域的实例迁移学习[J];自动化学报;2014年06期

4 于重重;田蕊;谭励;涂序彦;;非平衡样本分类的集成迁移学习算法[J];电子学报;2012年07期

5 赵秀宽;阳建宏;黎敏;徐金梧;;一种改进的不平衡数据集分类方法[J];计算机工程;2011年15期

6 欧阳震诤;罗建书;胡东敏;吴泉源;;一种不平衡数据流集成分类模型[J];电子学报;2010年01期

7 王和勇;樊泓坤;姚正安;李成安;;不平衡数据集的分类方法研究[J];计算机应用研究;2008年05期



本文编号:2320123

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2320123.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e81c9***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com