基于分位数回归的自适应组Lasso变量选择
发布时间:2017-12-27 13:29
本文关键词:基于分位数回归的自适应组Lasso变量选择 出处:《西南交通大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 分位数回归 组变量选择 自适应组Lasso Oracle性质 调节参数选择
【摘要】:近年来,Koenker提出的分位数回归在理论和方法上都得到了广泛的发展与应用。分位数回归与均值回归相比,其不需要对误差分布做特定假设,损失函数是一个绝对偏差的加权和,因此估计的回归系数对异常值不敏感,相比于最小二乘方法具有稳健性,而且能更加全面地刻画解释变量对响应变量不同分位点的影响。故作为均值回归分析的一种稳健替代方法,分位数回归被普遍地用于研究响应变量和解释变量之间的潜在关系。研究变量维数p值固定时,组解释变量的线性模型的惩罚分位数回归。为了能同时选择非零变量组和估计回归系数,考虑了带有自适应组Lasso惩罚项的分位数估计,并证明了估计变量选择具有相合性,而且估计的非零系数满足渐近正态性,进而证明了自适应组Lasso估计的Oracle性质。在数值模拟中,对于随机误差项服从尖峰厚尾分布(如柯西分布)时,验证了自适应组Lasso分位数估计(agLasso-Q)相比自适应组Lasso估计(agLasso-LS)能更准确地选择出零系数,且随样本数增大表现更好。针对所提出的自适应组Lasso分位数回归中调节参数的选取,不同于以往惩罚分位数回归常用的AIC、BIC等信息准则,考虑了一种惩罚交叉验证方法PCV,以带有对模型复杂程度做惩罚的SIC准则形式作为十折交叉验证方法的损失函数,从理论上证明了 PCV变量选择具有相合性,并讨论比较了该准则与其他调节参数选择准则的效果。通过对不同分位点进行模拟,发现当随机误差项s来自尖峰厚尾分布时,且在τ = 0.05和τ = 0.95分位点时,PCV准则相较于施瓦茨信息准则和交叉验证能更好地估计组回归系数,主要体现在有更小的均方误差。
[Abstract]:In recent years, the quantile regression proposed by Koenker has been widely developed and applied in theory and method. Quantile regression and compared the mean reversion, does not need to make specific assumptions about the error distribution, the loss function is a weighted sum of the absolute deviation, so the estimation of regression coefficient is not sensitive to outliers, compared to the least squares method is robust, and can more comprehensively depict the explanatory variables on the response variables of different sites. Therefore, as a robust substitution method for mean regression analysis, quantile regression is widely used to study the potential relationship between the response variables and the explanatory variables. When the p value of the variable dimension is fixed, the penalty quantile regression of the linear model of the set of variables is explained. At the same time in order to choose non zero variable group and the estimation of regression coefficient, consider the quantile group Lasso with adaptive penalized estimation, and prove the estimation of variable selection with consistency, and estimate the nonzero coefficients satisfy asymptotic normality, and prove that Oracle group properties of adaptive Lasso estimation. In numerical simulation, when the random error term obeys the peak and thick tail distribution (such as Cauchy distribution), it is verified that the Lasso quantile estimation (agLasso-Q) of adaptive group can select the zero coefficient more accurately than the adaptive group Lasso estimation (agLasso-LS), and it is better with the increase of sample size. According to the regulation of the proposed adaptive parameter Lasso in quantile regression, quantile regression is different from the previous punishment commonly used AIC and BIC information criterion, is considered a punishment cross validation method PCV, with the complexity of models do punish SIC criterion form of loss function for ten fold cross validation method and it is proved theoretically that PCV variable selection with consistency, and discuss the criteria and other parameters selection criteria to compare the effects of. Based on the different sites of simulation, found that when the random error of s from the leptokurtic distribution, and R = 0.05 and R = 0.95 quantile, PCV criterion is compared to the Schwartz information criterion and cross validation can better estimate the regression coefficient, mainly reflected in the mean square error is smaller.
【学位授予单位】:西南交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F224
【参考文献】
相关期刊论文 前7条
1 罗幼喜;李翰芳;田茂再;郑列;;基于双惩罚分位回归的面板数据模型理论与实证研究[J];武汉科技大学学报;2016年06期
2 刘建伟;崔立鹏;罗雄麟;;组稀疏模型及其算法综述[J];电子学报;2015年04期
3 牛银菊;马筱萌;;部分线性模型的adaptive group lasso变量选择[J];西北师范大学学报(自然科学版);2015年01期
4 刘建伟;崔立鹏;刘泽宇;罗雄麟;;正则化稀疏模型[J];计算机学报;2015年07期
5 李子强;田茂再;罗幼喜;;面板数据的自适应Lasso分位回归方法研究[J];统计与信息论坛;2014年07期
6 丁毅涛;张吐辉;张海;;稀疏Group Lasso高维统计分析[J];西北大学学报(自然科学版);2014年02期
7 张吐辉;张海;;基于L_p正则化的自适应稀疏group lasso研究[J];纯粹数学与应用数学;2014年02期
,本文编号:1341908
本文链接:https://www.wllwen.com/jingjifazhanlunwen/1341908.html