半监督支持向量机模型与算法研究

发布时间:2018-01-03 15:09

  本文关键词:半监督支持向量机模型与算法研究 出处:《上海大学》2016年博士论文 论文类型:学位论文


  更多相关文章: 半监督学习 支持向量机 二次曲面支持向量机 无核 分类问题 凸松弛


【摘要】:支持向量机是一种针对小样本分类问题的机器学习方法,它是根据统计学习理论中的结构风险极小化原则提出的,由于具有获得全局最优解以及良好的泛化能力被广泛应用到当前的热点领域,如压缩感知、稀疏优化、模式识别、特征提取、图像处理和医疗诊断等领域中.半监督支持向量机是一种同时考虑有标签样本和无标签样本的学习方法.由于在实际问题中人们通常容易获取大量的无标签样本和少量的有标签样本,因而半监督支持向量机被广泛应用到处理大规模数据识别与分类问题中.然而,半监督支持向量机的挑战主要在于其数学模型是一个难解的优化问题,并且在处理非线性分类时选取核函数既耗时又带来计算上的困难.因而研究半监督分类的模型与算法设计具有重要的理论意义和广泛的应用价值.本博士学位论文主要研究了新的半监督支持向量机分类模型与算法,并通过人工数据集和分类数据库中的基准数据集测试新方法的分类表现.首先,针对半监督支持向量机模型对应的优化问题难解的挑战以及带有二次Hinge损失函数的半监督支持向量机模型的研究,提出了两种锥松弛方法.半监督支持向量机模型对应的优化问题是混合整数规划问题,本文首先提出了一个新的半正定松弛问题,并近似估计了原问题最优值与该松弛问题最优值的最大比值,即该松弛问题对原问题的近似程度.接着,构造了与原混合整数规划问题等价的全正锥规划问题.由于该问题通常是NP-难的,故对其进行进一步松弛,从而得到该问题的双非负松弛问题.与半正定松弛相比,双非负松弛得到的最优值是原问题最优值更紧的下界.最后,通过凸优化工具包CVX和交替方向算法求解两个松弛问题,数值结果表明两种松弛方法都得到了较高的分类精度,并且双非负松弛方法比半正定松弛方法分类效果更好.其次,针对选取合适的核函数比较困难且比较耗时等问题,首次提出无核半监督二次曲面支持向量机模型,该模型是一个混合整数规划问题,通常来说是NP-难的.首先将该混合整数规划问题等价转化为一个带有绝对值约束的非凸优化问题,再使用向量提升技术将其松弛为一个多项式时间可计算的半正定规划问题,并采用凸优化工具包CVX进行求解.通过数值实验发现,与传统半监督支持向量机方法和监督支持向量机方法进行对比,半监督二次曲面支持向量机能够得到更高的分类精度.实验结果不仅说明无核分类模型是有效的,也表明同时训练有标签和无标签样本提高了分类性能.但是该方法存在的问题是当数据集规模较大时容易产生内存溢出.最后,针对半监督二次曲面支持向量机存在的计算时间较长和存储内存较大两个问题,提出无核半监督中心二次曲面支持向量机模型,该模型利用了中心支持向量机的结构优势,将半监督二次曲面支持向量机模型对应的优化问题简化成为一个只含有等式约束的混合整数规划问题,该问题一般来说是NP-难的.为近似求解该问题,先运用半正定松弛技术再添加线性矩阵不等式约束将原问题松弛为一个半正定规划问题.对于松弛后的问题,设计原始交替方向算法进行求解.数值结果显示,与半监督二次曲面支持向量机相比该方法有效提高了计算效率和分类精度;并且有标签样本的标签与无标签样本的特征都对分类精度有很大影响.
[Abstract]:Support vector machine is a kind of based on small sample classification problem of machine learning method, which is based on the statistical learning theory of the structure risk minimization principle, due to obtain the global optimal solution and good generalization ability has been widely applied to the hot fields, such as compressed sensing, sparse optimization, pattern recognition, feature extraction the field of image processing and medical diagnosis. Semi supervised support vector machine is a kind of both labeled and unlabeled samples. The learning methods in practical problems people are often easy to obtain large number of unlabeled examples and a small amount of labeled samples, so the semi supervised support vector machine is widely used to deal with the problem the classification of large datasets. However, semi supervised support vector machine challenge lies in its mathematical model is a difficult optimization problem, and in the treatment of non line The kernel function of the classification were time-consuming and bring computational difficulties. So it has important theoretical significance and wide application value of design model and algorithm for semi supervised classification. This dissertation mainly studies the semi supervised support vector machine classification model and algorithm of the new, and the benchmark data and synthetic data sets classification the database classification performance test method. First, according to the complicated optimization problem of semi supervised support vector machine model corresponding to the challenge and with two Hinge loss function of semi supervised support vector machine model research, put forward two kinds of optimization problems. The cone relaxation method for semi supervised support vector machine model corresponding to the mixture the integer programming problem, this paper proposes a new semi definite relaxation problem, and estimates the optimal value of the original problem is the maximum ratio and the optimal value of the relaxation problem That is, the degree of approximation of the relaxation problem of the original problem. Then, construct equivalence with the original mixed integer programming problem is cone programming problem. Because the problem is often difficult to NP-, so the further relaxation, resulting in two non negative relaxation of the problem. Compared with the semi definite relaxation, double non negative optimal relaxation value obtained is the lower bound of original problem optimal value tight. Finally, through the convex optimization toolkit CVX and alternating two relaxation algorithm for solving the problem of direction, the numerical results show that two kinds of relaxation methods have obtained higher classification accuracy, and the double non negative relaxation method better classification than the semi definite relaxation method. Secondly to select the appropriate kernel function, difficult and time-consuming problem, first proposed nuclear free surface two semi supervised support vector machine model, this model is a mixed integer programming problem, generally speaking is difficult to NP- At first the mixed integer programming problem is equivalent to a constraint with the absolute value of the non convex optimization problem, and then use the vector technology to enhance the relaxation as a semidefinite programming problem can be computed in polynomial time, and the convex optimization toolkit CVX is used to solve the problem. Through numerical experiments, and the traditional semi supervised support vector machine method and supervised support vector machine method are compared, the semi supervised support vector machine two surface can get higher classification accuracy. The experimental results not only illustrate the non nuclear classification model is effective, but also shows that the training labels and unlabeled samples to improve the classification performance. But the problem with this method is that when the data in large scale is prone to memory overflow. Finally, according to the semi supervised support vector machine two surface are longer computing time and large storage memory two problems, put forward Non nuclear center two surface semi supervised support vector machine model, this model uses the structure of the center of the advantages of support vector machine, semi supervised optimization problem of the two surface corresponding to the support vector machine model is simplified into a mixed integer programming problem with equality constraints, the problem in general is NP- hard. The approximate solution of the problem, using semi definite relaxation techniques add linear matrix inequality constraints to the original problem into 1.5 relaxation SDP problem. For the relaxed problem, the original design of alternating direction algorithm. The numerical results show that the two surface and the semi supervised support vector machine compared this method effectively improves the calculation efficiency and classification accuracy; and label the sample label and label free sample features have a great influence on the classification accuracy.

【学位授予单位】:上海大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP181

【参考文献】

相关博士学位论文 前2条

1 郭传好;几类锥规划问题算法与应用的研究[D];上海大学;2013年

2 赵莹;半监督支持向量机学习算法研究[D];哈尔滨工程大学;2010年



本文编号:1374408

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1374408.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户b70f9***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com