数据归一化方法对提升SVM训练效率的研究
[Abstract]:Support Vector Machines (SVM) is a machine learning method based on statistical learning theory, structural risk minimization principle and VC dimension theory. It has been widely used in many fields for its excellent classification ability in recent decades, and is still one of the most popular research fields in machine learning. Data normalization is a necessary data preprocessing process for SVM training. The commonly used normalization strategies are [-1,+1], N (0,1), etc. However, the existing literature has not found the scientific basis for these commonly used normalization methods. In this paper, the order of SVM is minimal. It is found that the Gaussian kernel function will be affected by the attribute values of the data samples, and the participation of the Gaussian kernel function will be reduced if the attribute values are too large or too small. The plane is too rugged. The paper explores and studies the internal mechanism of data normalization by empirical experiments, and the effects of normalization and non-normalization on training efficiency and model prediction ability. The data are trained by SVM and the changes of the objective function values with the number of iterations, training time, model testing and k-CV performance are recorded. The algorithm is programmed with C++ 11 technology, and the calculation and output of the objective function value, its variation value, training time and test accuracy are realized. The typical research literature of sequential minimization optimization algorithm using Gaussian kernel function is deeply analyzed, and the optimal value of Gaussian kernel radius is determined, and the precision value of violation of KKT condition is determined. The results show that the determined values of lambda and kappa can achieve the best generalization ability, and through the analysis of the change curve of output data, we can draw a reasonable conclusion that the training efficiency of SVM can be improved by data pretreatment. (2) The methods of data pretreatment are studied deeply, especially the normalization of the maximum value and the normalization of the median value. Three different data normalization methods of standard fraction normalization are applied to SVM classifier. The experimental results show that the data normalization method can compensate for the shortage of kernel radius of Gaussian kernel function and make Gaussian kernel function more ideal for SVM classification. (3) Standard experimental data sets. Three different data normalization methods are used to preprocess the SVM data, and a variety of experimental methods are designed. The training time and test accuracy are recorded and compared in detail by using k-CV verification method. (4) By analyzing the effect of data normalization on the training efficiency of SVM and comparing the difference of classification ability, the optimal criterion of data normalization which can improve the training efficiency of SVM is put forward, i.e. the value of each data attribute is controlled within the conventional comparable range, such as: [-0.5, +0.5]~[-5, +5], N (0,1) ~ N (0,0) N (0). Through a large number of experimental analysis and verification, data normalization can effectively improve the training efficiency of SVM. This paper provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【学位授予单位】:山东师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP181
【参考文献】
相关期刊论文 前10条
1 柴岩;王庆菊;;基于边界向量的样本取样SMO算法[J];系统工程;2015年06期
2 刘洛霞;;基于SVM的多变量函数回归分析研究(英文)[J];电光与控制;2013年06期
3 王新志;陈伟;祝明坤;;样本数据归一化方式对GPS高程转换的影响[J];测绘科学;2013年06期
4 赵长春;姜晓爱;金英汉;;非线性回归支持向量机的SMO算法改进[J];北京航空航天大学学报;2014年01期
5 刘学艺;李平;郜传厚;;极限学习机的快速留一交叉验证算法[J];上海交通大学学报;2011年08期
6 顾亚祥;丁世飞;;支持向量机研究进展[J];计算机科学;2011年02期
7 ;A new data normalization method for unsupervised anomaly intrusion detection[J];Journal of Zhejiang University-Science C(Computers & Electronics);2010年10期
8 濮定国;金中;;新的拉格朗日乘子方法[J];同济大学学报(自然科学版);2010年09期
9 骆世广;骆昌日;;加快SMO算法训练速度的策略研究[J];计算机工程与应用;2007年33期
10 谈效俊;张永新;钱敏平;张幼怡;邓明华;;芯片数据标准化方法比较研究[J];生物化学与生物物理进展;2007年06期
相关博士学位论文 前1条
1 段会川;高斯核函数支持向量分类机超级参数有效范围研究[D];山东师范大学;2012年
相关硕士学位论文 前2条
1 王正鹏;数据标准化及随机游走下的语义关系相似度计算[D];复旦大学;2012年
2 于丹;基因芯片数据归一化处理的几点研究[D];浙江大学;2008年
,本文编号:2181215
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2181215.html