使用Nesterov步长策略投影次梯度方法的个体收敛性

发布时间：2018-02-27 12:38

本文关键词： 机器学习非光滑损失函数问题投影次梯度方法 Nesterov步长策略个体收敛速率稀疏学习　出处：《计算机学报》2018年01期 　论文类型：期刊论文

【摘要】：很多机器学习问题都可以最终转换为优化问题来进行求解,凸优化算法已经被成功用于各种机器学习优化问题中,而在优化算法的研究中是否能获得最优的收敛速率是一个最基本问题.此外,稀疏性是稀疏学习问题中关注的另一个目标.目前,人们已经提出了大量的随机优化方法求解大规模机器学习优化问题,但大部分的研究只是针对平均输出方式获得了最优收敛速率.个体输出方式显然比平均方式的输出具有更好的稀疏性,但使个体收敛速率获得最优具有一定的难度,人们已经将强凸情形下的最优个体收敛性作为公开问题进行广泛研究.对于光滑目标函数的优化问题,著名学者Nesterov提出了一种步长策略,使得梯度方法的收敛速率获得了数量级形式的加速,并且获得了最优的个体收敛速率.目前,Nesterov加速算法已经应用于各种具有光滑损失函数机器学习优化问题中,研究者基于该加速策略提出了大量的随机优化算法.能否将这种技巧推广至非光滑情形获得最优的个体收敛速率显然是有意义的问题.文中考虑在非光滑优化算法中引入这种步长策略.特别地,我们聚焦经典的一阶梯度方法,提出了一种嵌入加速算法步长策略的投影次梯度算法,证明了这种算法在求解非光滑损失函数学习问题时具有最优的个体收敛速率.这是比标准投影次梯度方法只有在平均输出方式下才具有最优收敛速率更强的结论,也是一阶梯度方法在个体最优收敛速率方面比较接近于大家期待的研究成果.与平均方式输出以及线性插值的投影次梯度方法相比,该文所提方法的梯度运算在插值策略之后,因此在求解l1范数约束的hinge损失函数学习问题时具有更好的稀疏性.人工数据集上的实验验证了所提方法的正确性,基准数据集上验证了该方法在保持稀疏性方面具有良好的性能.
[Abstract]:Many machine learning problems can be transformed into optimization problems to solve them. Convex optimization algorithms have been successfully used in various machine learning optimization problems. Whether the optimal convergence rate can be obtained in the study of optimization algorithm is a basic problem. In addition, sparsity is another objective of sparse learning problem. A large number of stochastic optimization methods have been proposed to solve large-scale machine learning optimization problems. But most of the studies only get the optimal convergence rate for the average output mode. The individual output mode is obviously more sparse than the average output mode, but it is difficult to obtain the optimal individual convergence rate. In the case of strong convexity, the optimal individual convergence is widely studied as an open problem. For the optimization of smooth objective functions, Nesterov, a famous scholar, has proposed a step size strategy. The convergence rate of the gradient method is accelerated in the order of magnitude, and the optimal individual convergence rate is obtained. At present, the Nesterov acceleration algorithm has been applied to various machine learning optimization problems with smooth loss functions. Based on this acceleration strategy, researchers have proposed a large number of stochastic optimization algorithms. It is significant to extend this technique to the non-smooth case to obtain the optimal individual convergence rate. In this paper, we consider the non-smooth optimization algorithm. The introduction of this step strategy. In particular, We focus on the classical first-order gradient method and propose a projection subgradient algorithm embedded in the step size strategy of the acceleration algorithm. It is proved that this algorithm has the optimal individual convergence rate when solving the nonsmooth loss function learning problem. This conclusion is stronger than that of the standard projection subgradient method only in the average output mode. The first order gradient method is also close to the expected research results in terms of individual optimal convergence rate. Compared with the average output method and the projection subgradient method of linear interpolation, the gradient operation of the proposed method follows the interpolation strategy. Therefore, the hinge loss function learning problem with l 1 norm constraint has better sparseness. Experiments on artificial data sets verify the correctness of the proposed method. It is verified on the datum data set that this method has good performance in keeping sparsity.
【作者单位】：中国人民解放军陆军工程大学指挥信息系统学院;中国人民解放军陆军炮兵防空兵学院十一系;
【基金】：国家自然科学基金(61273296,61673394)资助~~
【分类号】：TP181

【相似文献】