广义线性模型中的参数估计及变量选择方法研究

发布时间：2019-03-20 14:02

【摘要】：模型选择问题是统计分析中一个至关重要的问题。如何使得建立的模型更加精确是所有学者研究的重中之重。当模型中存在复共线性问题时,怎样解决这样的问题是现目前研究中的关键。本文分成两种情况对广义线性模型中的这一问题进行分析并提出在各种情况下应该如何处理。第一种情况是当模型中我们所选择的每一个变量都不可缺少,同时这些变量之间又具有一定的多重共线性问题时,我们通常选择岭估计方法,因为岭估计不仅仅能够选出所有的变量,同时岭估计还对模型进行了一定的压缩惩罚,能够解决多重共线性问题。但是由于岭估计中含有岭参数,岭参数的选取直接影响模型的精确度。所以本文通过对岭参数在一般线性模型和广义线性模型中的参数估计方法进行总结,同时提出一种新的岭参数估计方法,且将这些参数估计方法运用于Logistic岭回归模型中,进行分析。运用Monte Carlo模拟,通过比较模型的均方误差(MSE)、参数的均值、参数的标准差(SD)来进行比较分析,得到新提出的参数估计方法在Logistic回归模型中,不仅具有相对较小的MSE,并且是这些参数估计方法中最稳定的一种,从而可以得到新提出的岭参数估计方法相对较优。第二种情况是当模型是含有一些对模型无用变量的大型模型时,需要对模型中的变量进行筛选,通过压缩惩罚使得一些解释变量的回归系数压缩到零,进而达到变量选择的目的。本文先对文献中提出的一些经典的变量选择方法LASSO、SCAD、Elastic Net和MCP进行综述。并且由于在Breheny和Huang(2011)这篇文章中曾指出在一般线性回归模型和Logistic回归模型下,MCP都相对优于LASSO和SCAD,所以本文将这四种变量选择方法运用到Poisson回归模型中并在不同情况下进行了模拟实验。当变量之间是相对独立的,得到MCP能够准确的找出解释变量中系数不为零的变量,同时选出的其他不相关的变量是最少的;当变量之间有一定的相关关系时,MCP同时也是这几种变量选择方法中能够最准确的找出所需的变量;当变量之间含有一定的组效应时,MCP相对效果也是非常的理想。因此我们得到MCP变量选择方法,相对而言优于LASSO、SCAD和Elastic Net变量选择方法。
[Abstract]:Model selection is one of the most important problems in statistical analysis. How to make the model more accurate is the most important research of all scholars. When there is a polycollinearity problem in the model, how to solve this problem is the key in the present research. This paper analyzes this problem in generalized linear model in two cases and puts forward how to deal with it in all kinds of cases. In the first case, when every variable we choose in the model is indispensable and there are some multicollinearity problems between these variables, we usually choose the ridge estimation method. Because ridge estimation not only can select all the variables, but also carries on some compression penalty to the model, which can solve the multi-collinearity problem. However, because ridge estimation contains ridge parameters, the selection of ridge parameters directly affects the accuracy of the model. So this paper summarizes the methods of ridge parameter estimation in general linear model and generalized linear model, and proposes a new ridge parameter estimation method, and applies these methods to Logistic ridge regression model. Carry out analysis. Monte Carlo simulation is used to compare and analyze the mean square error of (MSE), parameters and the standard deviation of parameters (SD). The new method of parameter estimation in Logistic regression model has not only a relatively small MSE,. And it is one of the most stable methods of parameter estimation, so that the proposed ridge parameter estimation method is relatively optimal. The second case is that when the model is a large model that contains some useless variables to the model, the variables in the model need to be screened, and the regression coefficients of some explanatory variables are compressed to zero by compression punishment. And then achieve the purpose of variable selection. In this paper, some classical variable selection methods, LASSO,SCAD,Elastic Net and MCP, are reviewed. And as pointed out in Breheny and Huang (2011), under the general linear regression model and Logistic regression model, MCP is relatively superior to LASSO and SCAD,. So this paper applies these four variable selection methods to Poisson regression model and carries out simulation experiments under different circumstances. When the variables are relatively independent, it is obtained that MCP can accurately find out the variables whose coefficients are not zero in the explanatory variables, and at the same time, the other unrelated variables are the least. When there is a certain correlation between variables, MCP is also the most accurate selection method to find out the required variables; when there is a certain group effect between variables, the relative effect of MCP is also very ideal. Therefore, we get the MCP variable selection method, which is better than the LASSO,SCAD and Elastic Net variable selection method.
【学位授予单位】：重庆大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：O212

【相似文献】