当前位置:主页 > 科技论文 > 自动化论文 >

半监督学习框架下基于图的SVM分类算法研究

发布时间:2018-02-26 02:18

  本文关键词: SVM 半监督分类 伪标记 LRR图 去噪处理 出处:《北方民族大学》2017年硕士论文 论文类型:学位论文


【摘要】:在机器学习领域,支持向量机(SVM)算法是较早的一种监督学习算法,它解决了早期神经网络中的过拟合和“维数灾难”等问题,并在诸多领域发挥了很好的应用。半监督学习可以有效利用标记样本和无标记样本,充分挖掘整体样本集的聚类结构信息。相比监督分类,对标记样本的数量要求不高,且性能较好。其中,基于图的半监督学习是当前最流行的一种半监督算法。本文在半监督学习框架下提出一种基于图模型的SVM分类算法,通过将无标记样本的特征信息纳入到算法的训练过程,进一步提升SVM算法的分类精度。首先,利用基于图的半监督学习方法给无标记样本赋予伪标记;然后将伪标记样本和标记样本信息共同输入到SVM算法中。生成的伪标记样本可能存在噪声样本,我们应先对伪标记样本集进行去噪处理,以避免噪声样本减弱扩充标记样本集所带来的正面效应。另外,伪标记样本的准确率越高,噪声样本越少,样本信息越有价值,工作量也会减少。所以,本文在扩充训练样本集中标记样本数目的预处理阶段,通过实验对比选取一个分类精度较高,性能较好的图模型,并结合SVM算法完成实验。本文主要研究工作如下:(1)第一阶段,针对UCI数据集和USPS手写数据集,对指数权重图(EW),k近邻图(kNN),1?范数图(LN),低秩表示图(LRR)进行实验和分析,最终选择低秩表示图(LRR)作为样本的预处理过程,不同的图模型结合高斯场和调和函数(GHF)传播算法完成分类实验。(2)第二阶段,对低秩表示图(LRR)赋予伪标记后的样本利用k近邻图算法对比标记值剔除噪声样本。并针对UCI数据集和USPS手写数据集进行实验,结果证明,本文提出算法相对传统SVM算法在缺乏标记样本情况下,可充分挖掘整体样本集样本分布信息,将SVM转换为一种新的样本可扩充性的半监督学习算法,且最终的分类精度更高。
[Abstract]:In the field of machine learning, support vector machine (SVM) algorithm is an earlier supervised learning algorithm, which solves the problem of early neural network in over fitting and "dimension disaster" and other issues, and played a very good application in many fields. Semi supervised learning can use labeled samples and unlabeled samples, full clustering structure information of the whole sample set. Compared with supervised classification, the number of labeled samples is not high, and good performance. The graph based semi supervised learning is a semi supervised algorithm is the most popular. This paper proposes a SVM classification algorithm based on graph model in the semi supervised learning framework, into the to the training process of the algorithm through the feature information will be unlabeled samples, to further improve the classification accuracy of SVM algorithm. Firstly, using semi supervised learning method to map the unlabeled samples with pseudo markers based on pseudo; then The labeled and labeled samples information is input to the SVM algorithm. The pseudo labeled samples may generate noise samples, we should first of pseudo labeledsamples denoising, to avoid noise samples decreased the positive effect brought the expansion of the labeledsamples. In addition, the pseudo labeled samples with higher accuracy, noise with fewer samples, sample information more valuable, the workload will be reduced. So, this paper expanded the preprocessing stage of the training sample set the number of labeled samples, select a higher classification accuracy compared with the experiment, graph model is a good performance, and SVM algorithm to complete the experiment. The main research work are as follows: (1) the first stage, according to the UCI data set and USPS data set of handwritten, index weight map (EW), K (kNN), the 1 nearest neighbor graph graph (LN), norm? Low rank representation (LRR) experiment and analysis, the final selection of low rank representation (LRR) as The pretreatment process of samples, different graph model combined with Gauss field and harmonic function (GHF) propagation algorithm to complete the classification experiment. (2) the second stage of low rank representation (LRR) provides pseudo labeled samples using k neighbor graph algorithm removes noise samples. The value of contrast markers and for the UCI data set and USPS handwritten data set for experiment, results show that the proposed algorithm compared with the traditional SVM algorithm in the case of lack of labeled samples, can fully excavate the sample distribution information of the whole sample set, SVM will be converted to a new sample scalable semi supervised learning algorithm, and the final classification accuracy is higher.

【学位授予单位】:北方民族大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP181

【参考文献】

相关期刊论文 前1条

1 张健;李白燕;;基于图论最小割集算法的图像分割研究[J];激光技术;2014年06期

相关博士学位论文 前1条

1 张国云;支持向量机算法及其应用研究[D];湖南大学;2006年



本文编号:1536118

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1536118.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户0ddaa***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com