当前位置:主页 > 社科论文 > 社会学论文 >

基于原数据相关性特征选择法

发布时间:2018-04-02 08:40

  本文选题:Lasso 切入点:最小角回归 出处:《兰州大学》2017年硕士论文


【摘要】:在特征选择问题中,Lasso、最小角回归和逐步回归(如向前逐步回归),都可以描述特征选择的过程,但是这些方法得出的特征选择过程都有缺陷.最小角回归以及修正最小角回归只能描述变量选入和删除点的情况,对于这些点以外的点的解无法知晓,所以最小角回归在数据稀疏化过程中并不完全.逐步回归方法如果前进步长过大则容易漏掉某些过程,步长过小则运算量太大.Lasso在取遍参数情况下所得稀疏化过程是完全的,然而Lasso的参数是连续的,所以要经过大量的参数格点值运算才能得到完全的稀疏过程,但这也会导致运算量太大.而Lasso模型的求解本身也是一个难题.为了解决上述问题,本文提出基于原始数据相关性的特征选择法,该方法(公式法)应用修正最小角回归思想做特征选择,运算过程中不将响应变量做中心化处理,这样便可得到自变量与响应变量相关性值与Lasso调整参数之间的对应关系,在经过一次类似修正最小角回归算法后,可以通过这种对应关系得到该数据下任意参数的Lasso的显式解.公式法不但提高了Lasso解的精确度,而且在做Lasso参数的大量格点值试验中,要比其他算法更快.我们将公式法用在一个糖尿病数据研究中,比较了公式法、坐标下降法和二次逼近算法,我们发现公式法的解精确度最高;我们也比较了这三种算法在不同维数、不同样本量和不同参数格点数下的运行时间,发现公式法花费时间最少,而且随着维数、样本量和参数格点数的增加,运行时间的增长也比其他两种方法缓慢很多.公式法思想也可以用于解释一些如坐标下降法等求解Lasso的其他算法.
[Abstract]:In feature selection problems, minimum angle regression and stepwise regression, such as forward stepwise regression, can all describe the process of feature selection. However, the feature selection process obtained by these methods is flawed. The minimum angle regression and the modified minimum angle regression can only describe the selection and deletion of the variables, but the solution of the points other than these points cannot be known. So the minimum angle regression is not complete in the process of data thinning. If the stepwise regression method is too large, it is easy to miss some processes, and if the step size is too small, the computation is too large. Lasso is complete in the case of searching through the parameters. However, the parameters of Lasso are continuous, so it is necessary to get a complete sparse process through a large number of parameter lattice value operations, but this will also lead to too much computation. The solution of Lasso model itself is also a difficult problem. In order to solve the above problem, In this paper, a feature selection method based on the correlation of raw data is proposed. This method (formula method) applies the idea of modified minimum angle regression to feature selection, and does not centralize the response variable in the operation. In this way, the corresponding relation between the correlation value of independent variable and response variable and the adjustment parameter of Lasso can be obtained. After a similar modified minimum angle regression algorithm, The explicit solution of Lasso of any parameter under this data can be obtained by this correspondence. The formula method not only improves the accuracy of Lasso solution, but also does a lot of lattice test of Lasso parameter. We compare the formula method, coordinate descent method and quadratic approximation algorithm. We find that the formula method has the highest accuracy. We also compare the running time of the three algorithms in different dimensions, different sample sizes and different parameter lattice points. It is found that the formula method takes the least time, and with the increase of dimension, sample size and parameter lattice points, The thought of formula method can also be used to explain some other algorithms for solving Lasso such as coordinate descent method.
【学位授予单位】:兰州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:C81

【参考文献】

相关期刊论文 前1条

1 王冬梅,沈颂东;逐步回归分析法[J];工业技术经济;1997年03期



本文编号:1699605

资料下载
论文发表

本文链接:https://www.wllwen.com/shekelunwen/shgj/1699605.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户0fab3***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com