当前位置:主页 > 科技论文 > 自动化论文 >

基于深度学习的领域适应问题研究

发布时间:2018-08-19 06:33
【摘要】:在视觉领域中,收集充分标注数据代价昂贵,而标准监督学习在标注数据稀缺时泛化能力差,无法满足实际需求,领域适应作为一个新兴的机器学习方法,旨在利用有丰富标签的源领域数据训练分类器,用于无标签或少量标签的目标领域。目前,跨领域迁移学习效果不理想的主要原因是存在负迁移、欠适配和欠拟合等三大问题,而另一种更具挑战性的情况是源领域与目标领域处于异构特征空间,致使迁移更加困难。因此,针对上述问题,本文的主要研究内容为:第一,针对同构领域适应,如何学习有效特征并最大程度减小领域间的分布差异以改善欠适配问题,提出了基于自动编码器的领域适应网络。首先,源域和目标域样本分别经过两层编码和解码操作以最小化重构误差学习更有效的特征表达;然后,分别在特征提取层和分类层使用最大均值差异准则同时匹配领域间的边缘和条件分布以最小化分布差异,并使用softmax分类器将源数据标签信息编码以提高分类表现;最后,通过梯度下降法学习网络参数,根据分类器的输出完成对目标域无标签样本的预测。第二,针对同构领域适应,学习模型未能充分描述预测数据所服从的概率分布而同时导致欠拟合和欠适配问题,且普通图正则项的引入未能充分改善负迁移问题,提出了基于超图正则化降噪自动编码器的领域适应网络。首先,通过降噪自动编码器提取更具鲁棒性的特征以减小欠拟合问题;其次,使用最大均值差异准则同时匹配领域间的边缘和条件分布以解决欠适配问题;然后,根据源和目标领域样本间关系引入超图正则项以解决负迁移问题,并根据源领域真实标签得到分类器损失函数;最后,通过梯度下降法学习网络参数,完成目标域样本的分类。第三,针对异构领域适应,浅层结构无法很好地拟合数据分布并得到更有效的特征表达,且未同时考虑到领域间数据分布的匹配以及几何结构和标签的一致性,提出基于自动编码器的异构领域适应网络。首先,分别利用两组自动编码器将源和目标领域数据映射到共享特征空间,并使用最大均值差异准则同时匹配领域间的边缘与条件分布;其次,引入流形对齐项,其中几何项用以保持领域内数据几何结构的一致性,相似项和相异项则用以保持领域间标签信息的一致性;然后,利用源领域和目标领域的标签信息,得到softmax分类器损失项;最后,通过梯度下降法学习网络参数,实现对目标域无标签样本的分类。在多个数据集上进行的对比实验结果表明,与传统同构和异构领域适应方法相比,本文所提模型均能够获得更好的分类表现,有效解决跨领域知识迁移问题。
[Abstract]:In the field of vision, it is expensive to collect fully annotated data, while standard supervised learning has poor generalization ability when tagging data is scarce, so it can not meet the actual needs. Therefore, domain adaptation is a new machine learning method. The aim of this paper is to use source domain data with rich tags to train classifiers for target areas without or with a few tags. At present, the main reasons why the effect of cross-domain transfer learning is not ideal are that there are three major problems: negative transfer, inadequate adaptation and under-fitting. Another more challenging situation is that the source domain and the target domain are in heterogeneous feature space. This makes migration more difficult. Therefore, the main research contents of this paper are as follows: first, how to learn effective features and minimize the distribution differences between domains in order to improve the ill-fit for isomorphic domain adaptation. A domain adaptive network based on automatic encoder is proposed. First, the source domain and target domain samples are encoded and decoded by two layers respectively to minimize the reconstruction error to learn more efficient feature representation. In the feature extraction layer and classification layer, the maximum mean difference criterion is used to match the edge and conditional distribution of the domain simultaneously to minimize the distribution differences, and the source data label information is encoded by the softmax classifier to improve the classification performance. The network parameters are studied by gradient descent method, and the target domain unlabeled samples are predicted according to the output of the classifier. Secondly, for the adaptation of isomorphism domain, the learning model can not adequately describe the probability distribution of the predicted data, which leads to the problem of under-fitting and ill-fit, and the introduction of the regular term in the common graph can not fully improve the negative migration problem. A domain adaptive network based on hypergraph regularization noise reduction automatic encoder is proposed. Firstly, the noise reduction automatic encoder is used to extract the more robust features to reduce the underfitting problem. Secondly, the maximum mean difference criterion is used to simultaneously match the edge and conditional distribution between domains to solve the problem of inadequate matching. According to the relationship between source and target domain samples, hypergraph canonical items are introduced to solve the problem of negative migration, and the classifier loss function is obtained according to the real label of source domain. Finally, the classification of target domain samples is accomplished by learning network parameters by gradient descent method. Thirdly, for heterogeneous domain adaptation, shallow structure can not fit the data distribution well and get more effective feature representation, and does not take into account the matching of data distribution between domains and the consistency of geometric structure and label at the same time. A heterogeneous domain adaptive network based on automatic encoder is proposed. Firstly, two sets of automatic encoders are used to map the source and target domain data to the shared feature space, and the maximum mean difference criterion is used to match the edge and conditional distribution of the domain simultaneously. The geometric terms are used to maintain the consistency of the geometric structure of the data in the domain, the similarity items and the different items are used to maintain the consistency of the label information between the domains, and then the loss items of the softmax classifier are obtained by using the label information of the source domain and the target domain. Finally, the network parameters are studied by gradient descent method to realize the classification of untagged samples in target domain. The experimental results on multiple datasets show that compared with the traditional isomorphism and heterogeneous domain adaptation methods, the proposed models can achieve better classification performance and effectively solve the problem of cross-domain knowledge transfer.
【学位授予单位】:中国矿业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.41;TP181

【参考文献】

相关期刊论文 前6条

1 唐宋;陈利娟;陈志贤;叶茂;;基于目标域局部近邻几何信息的域自适应图像分类方法[J];计算机应用;2017年04期

2 奚雪峰;周国栋;;面向自然语言处理的深度学习研究[J];自动化学报;2016年10期

3 臧绍飞;程玉虎;王雪松;;基于最大分布加权均值嵌入的领域适应学习[J];控制与决策;2016年11期

4 周军娜;陈伟;王珂;汤镇宇;;通过域适应实现人脸识别[J];计算机应用研究;2017年06期

5 孟娟;胡谷雨;潘志松;周宇欢;;领域适应学习算法研究与展望[J];计算机科学;2015年10期

6 巩萍;程玉虎;王雪松;;基于属性关系图正则化特征选择的零样本分类[J];中国矿业大学学报;2015年06期

相关硕士学位论文 前1条

1 毛海斌;基于半监督机器学习的情感分类领域适应问题研究[D];南京理工大学;2015年



本文编号:2190946

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2190946.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户63b18***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com