SVM训练数据归一化研究
发布时间:2018-09-12 12:00
【摘要】:数据归一化是训练支持向量机(SVM)必须的数据预处理过程.常用的归一化方法有[-1,+1]、N(0,1)等方法,但现有文献尚未发现关于这些常用归一化方法科学依据方面的研究.本文以经验性的实验对数据归一化的理由、归一化与不归一化对训练效率和模型预测能力影响等方面开展研究.论文选择标准数据集,对原始未归一化、不同方法归一化、人工逆归一化、任选数据属性列等情况下的数据分别进行了SVM训练,并记录目标函数值随迭代次数的变化、训练时间、模型测试及k-CV性能等信息.实验结果表明,将数据值限制在常规范围内的归一化方法,如[-0.5,+0.5]~[-5,+5]、N(0,1)~N(0,5)等均能在训练时间最短的情况下获得最佳的预测模型.本文工作为SVM以及一般机器学习算法的数据归一化提供了科学依据.
[Abstract]:Data normalization is a necessary data preprocessing process for training support vector machine (SVM). The commonly used normalization methods are [-1,1] N (0 ~ (1), but no research on the scientific basis of these commonly used normalization methods has been found in the existing literature. In this paper, the empirical experimental reasons for data normalization and the effects of normalization and non-normalization on training efficiency and model prediction ability are studied. In this paper, the standard data set is selected, and the data under the condition of original unnormalized, different method normalization, artificial inverse normalization, optional data attribute column and so on are trained by SVM respectively, and the change of objective function value with the number of iterations is recorded. Training time, model testing and k-CV performance information. The experimental results show that the normalized methods, such as [-0.5, 0.5] ~ [-5,5] N (0 1) N (0 5), can obtain the best prediction model under the condition of the shortest training time. This work provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【作者单位】: 山东师范大学信息科学与工程学院;山东师范大学山东省分布式计算机软件新技术重点实验室;山东师范大学实验室与设备管理处;
【分类号】:TP181
本文编号:2238937
[Abstract]:Data normalization is a necessary data preprocessing process for training support vector machine (SVM). The commonly used normalization methods are [-1,1] N (0 ~ (1), but no research on the scientific basis of these commonly used normalization methods has been found in the existing literature. In this paper, the empirical experimental reasons for data normalization and the effects of normalization and non-normalization on training efficiency and model prediction ability are studied. In this paper, the standard data set is selected, and the data under the condition of original unnormalized, different method normalization, artificial inverse normalization, optional data attribute column and so on are trained by SVM respectively, and the change of objective function value with the number of iterations is recorded. Training time, model testing and k-CV performance information. The experimental results show that the normalized methods, such as [-0.5, 0.5] ~ [-5,5] N (0 1) N (0 5), can obtain the best prediction model under the condition of the shortest training time. This work provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【作者单位】: 山东师范大学信息科学与工程学院;山东师范大学山东省分布式计算机软件新技术重点实验室;山东师范大学实验室与设备管理处;
【分类号】:TP181
【相似文献】
相关期刊论文 前1条
1 刘慧敏;王宏强;黎湘;;基于RPROP算法目标识别的数据归一化研究[J];现代雷达;2009年05期
,本文编号:2238937
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2238937.html