回归模型中变量选择的若干问题研究
发布时间:2018-05-19 16:52
本文选题:变量选择 + Gamma分布 ; 参考:《兰州交通大学》2017年硕士论文
【摘要】:在多元线性回归建模中,自变量的选择至关重要,一般从预测的准确性和模型的可解释性两个方面进行约束自变量个数的选择.数目众多的自变量可以反映更多响应变量的信息,从而达到更高的预测准确性,然而太多的自变量将导致模型可解释性减弱,应用价值大打折扣;自变量的太少的话,不足以反映响应变量的信息,因而预测准确性显著降低.变量选择问题的研究中,大多是在普通最小二乘法的基础上,附加关于待估计参数的约束条件,也就是增加惩罚函数,转化为惩罚最小二乘法.由于约束条件的压缩作用,会使得部分待估计参数变为0,从而实现变量选择的目的.此类方法中的常用经典算法有LASSO算法、适应性LASSO算法、SCAD算法以及弹性网算法.本文考虑待估计参数受到随机因素的影响前提下,建立了新的惩罚函数及惩罚最小二乘估计方法,并对该方法进行评价,具体内容如下:首先,系统介绍了变量选择方法的发展过程、通过添加惩罚函数来实现变量选择的基本思想;详细分析了LASSO算法、适应性LASSO算法、SCAD算法以及弹性网算法的建立过程和各自的优缺点:由于LASSO算法中惩罚函数的特性,导致在变量选择时选取的自变量个数偏多,同时存在多重共线时LASSO算法效果很差,于是适应性LASSO算法在LASSO的基础上进行改进,使得估计所得系数更加稀疏,选择更少的自变量;SCAD算法效果更加更明显,不仅可以选择更少的自变量,同时所得估计量满足稀疏性、无偏性、连续性以及Oracle等一系列优良性质;弹性网方法是将LASSO与经典的岭回归法结合而建立的新的变量选择方法,该方法主要优势在于处理自变量中出现组效应时的情形.其次,考虑到Gamma分布和Weibull分布是两类重要的寿命分布类,具有广泛的应用,于是分别假定参数受到的随机影响因素服从Gamma分布和Weibull分布,建立了新的惩罚函数以及惩罚最小二乘估计方法.文中通过层次极大似然估计法构造新的惩罚函数,讨论了惩罚函数性质,给出了参数估计的方法并证明新建立的惩罚最小二乘量满足Oracle性质.最后,通过案例分析对新建立的变量选择方法进行评价.文中以均方误差和平均绝对误差作为评价指标,选取了以往文献中使用的经典案例进行分析,计算各评价指标,并和LASSO算法、适应性LASSO算法、SCAD算法以及弹性网算法计算的结果进行对比,我们发现,新建立的算法处理稀疏情形优势明显,均优于其他算法,而对于非稀疏情形,效果和适应性LASSO算法差异不大.
[Abstract]:In multivariate linear regression modeling, the selection of independent variables is very important. In general, the selection of the number of independent variables is constrained from two aspects of the accuracy of prediction and the interpretability of the model. A large number of independent variables can reflect more information of the response variables, thus achieving higher prediction accuracy. However, too many independent variables will lead to the model. The type of interpretability is weakened and the application value is discounted; too few of the independent variables are not sufficient to reflect the information of the response variables, so the accuracy of the prediction is significantly reduced. In the study of the selection of variables, the constraints of the parameters to be estimated are added to the general least square method, which is to increase the penalty function. In order to punish the least square method, due to the compression of the constraint conditions, some parameters to be estimated will be changed to 0 to achieve the purpose of variable selection. The commonly used classical algorithms in this kind of method have LASSO algorithm, adaptive LASSO algorithm, SCAD algorithm and elastic network algorithm. A new penalty function and a penalty least square estimation method are established, and the method is evaluated. The specific contents are as follows: firstly, the development process of variable selection method is introduced, and the basic idea of variable selection is realized by adding penalty function. The LASSO algorithm, adaptive LASSO algorithm, SCAD algorithm and elastic network are analyzed in detail. The process of building the algorithm and its advantages and disadvantages: because of the characteristics of the penalty function in the LASSO algorithm, the number of independent variables selected in the selection of variables is much more than that of the variable selection. At the same time, the effect of the LASSO algorithm is very poor when there is multiple Coline. So the adaptive LASSO algorithm is modified on the basis of LASSO, making the estimated coefficient more sparse and less choice. The effect of SCAD algorithm is more obvious, not only can choose less independent variables, but also the estimated quantity satisfies a series of excellent properties such as sparsity, unbiased, continuous and Oracle. The elastic network method is a new variable selection method which combines the LASSO with the classical ridge regression method. The main advantage of this method lies in the advantages of the method. Second, the Gamma distribution and Weibull distribution are two classes of important life distribution classes, which are widely used. Therefore, the random influence factors of the parameters are assumed to be subject to the Gamma distribution and the Weibull distribution, and a new penalty penalty function and a penalty least square estimation method are established. This paper constructs a new penalty function by hierarchical maximum likelihood estimation, discusses the property of penalty function, gives the method of parameter estimation and proves that the newly established penalty least squares satisfy the Oracle property. Finally, the new variable selection method is evaluated by case analysis. The mean square error and the mean absolute error are used as the evaluation. By analyzing the classic cases used in the previous literature and calculating the evaluation indexes, we compare the results with the LASSO algorithm, the adaptive LASSO algorithm, the SCAD algorithm and the elastic network algorithm. We find that the new algorithm is superior to other algorithms in dealing with the sparse situation and is better than the other algorithms, but for the non sparse case, There is little difference between the effect and the adaptive LASSO algorithm.
【学位授予单位】:兰州交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F224
【参考文献】
相关博士学位论文 前3条
1 袁晶;贝叶斯方法在变量选择问题中的应用[D];山东大学;2013年
2 刘吉彩;生存数据统计模型的变量选择方法[D];华东师范大学;2014年
3 樊亚莉;稳健变量选择方法的若干问题研究[D];复旦大学;2013年
相关硕士学位论文 前1条
1 高少龙;几种变量选择方法的模拟研究和实证分析[D];山东大学;2014年
,本文编号:1910897
本文链接:https://www.wllwen.com/jingjifazhanlunwen/1910897.html