删失分位数回归的光滑化算法
本文选题:删失分位数回归 + 光滑函数 ; 参考:《北京交通大学》2015年硕士论文
【摘要】:摘要:近年来,高维线性回归模型在信息技术、生物学、化学计量学、基因组学、经济学、金融学、功能性磁共振成像等科学领域备受关注.“高维”回归模型是指在回归模型中未知变量的个数比样品的数量大得多.很显然,如果没有额外的假设,这类数据是病态的,是几乎是现在技术不可能解决的.所以通常我们要在模型上做出一些假设.而一个比较好的假设条件是使用稀疏假设.即假定只有少数未知变量影响样本的观测值.高维数据分析给统计学家带来许多挑战,迫切需要新的方法和理论. 为了估计高维线性回归的回归系数,我们需要选取适当的回归方法.普通的最小二乘回归模型的主旨在于基于解释变量来估计因变量的均值.而分位数回归模型利用自变量和因变量的条件分位数来进行建模.与最小二乘回归相比,条件分位数回归模型具有稳健型和灵活性的优点.所以本文考虑使用分位数回归模型来解决高维稀疏线性回归模型. 长期以来,加正则项是一个处理高维稀疏数据的有效的并被广泛使用的方法.加正则项这一技巧可以使函数更快的收敛.另外,这一技术可以使得高维线性模型的求解变得容易.因为加上正则项,许多回归模型都具有很好的oracle性质.正则项分为很多种,主要有lp,l1和加权l1惩罚.本文中,我们考虑加权l1惩罚. 在医学领域中,删失分位数回归是做生存分析的有力工具.删失数据是指在某种设定下,样本值并不能被完全观测到的数据.例如,样本值高于或低于某一个固定(或随机)的值时,我们只能观测到那个固定(或随机的值).这样得到的数据是不完整的,叫做删失数据.在医学领域中,删失数据分位数回归模型已经取代Cox比例风险模型和加速失效时间(AFT)模型成为研究生存分析的主要方法.本文中,我们考虑加正则项的稀疏高维删失分位数回归模型.由于删失分位数回归模型最终可以转换为分位数回归模型的线性结合,我们可以将解决分位数回归模型的方法用于解决删失数据分位数回归模型. 文中,我们首次使用光滑函数解决删失分位数回归的问题.首先,我们在第一、二、三章分别介绍了分位数以及删失数据以及高维数据的相关背景知识以及基本性质.其次,我们列举了两个光滑函数,包括分位数Huber惩罚函数,去代替分位数函数.由于Huber惩罚函数具有和分位数损失函数一样的最优值点,我们在文章的理论部分主要使用Huber惩罚函数作为研究对象.使用光滑函数使得我们模型的目标函数一—删失分位数回归模型成为可微函数.因此我们可以得到了有着一阶和二阶次微分的目标函数.在可微的基础上,我们利用加权的l1正则惩罚项,为删失数据分位数回归模型设计了一个加权光滑迭代算法——MIRL,去实现删失分位数回归中的变量选择问题.于是我们不仅可以得到算法的收敛性,还证明了模型的最优解在一般假设条件下具有渐近正态性质,oracle性质等良好的统计性质.数值实验部分,我们做了充分的实验——随机高斯矩阵实验和Toeplitz协方差矩阵实验.在数值实验表中,最明显的特征就是FPR和TPR分别几乎是0和1.这表示,我们的方法可以准确的将有效变量挑选出来,这就说明模型和算法有很好的变量选择功能.不仅实验误差非常小,而且实现了很好的变量选择效果,这说明我们的算法有很好的效果.
[Abstract]:Abstract: in recent years, high dimensional linear regression models have attracted much attention in the fields of information technology, biology, chemometrics, genomics, economics, finance, functional magnetic resonance imaging and other scientific fields. "High dimensional" regression model means that the number of unknown variables in the regression model is much larger than that of the sample. Obviously, if there is no extra Assuming that this kind of data is ill conditioned, it is almost impossible for technology to solve it now. So we usually have to make some assumptions on the model. A better assumption is using the sparse assumption. That is, it is assumed that only a few unknown variables affect the observation of the sample. High dimension analysis brings many challenges to statisticians. New methods and theories are needed.
In order to estimate the regression coefficient of high dimensional linear regression, we need to select the appropriate regression method. The main purpose of the ordinary least square regression model is to estimate the mean of the dependent variable based on the explanatory variable. The quantile regression model uses the independent variable and the conditional quantile of the dependent variable to model. Compared with the least square regression, the condition is compared with the least square regression. Quantile regression models have the advantages of robustness and flexibility. Therefore, we consider quantile regression models to solve high-dimensional sparse linear regression models.
The addition of regular term is an effective and widely used method to deal with high dimensional sparse data for a long time. The technique of adding regular terms can make the function converge faster. In addition, this technique can make the solution of high dimensional linear model easier. As the regular term, the multiregression model has good Oracle properties. There are many kinds of items, including LP, L1 and weighted L1 penalty. In this paper, we consider weighted L1 penalty.
In the medical field, the quantile regression is a powerful tool for the survival analysis. The deleted data is the data that the sample value can not be fully observed under certain settings. For example, when the sample value is higher or lower than a fixed (or random) value, we can only observe that fixed (or random) value. In the medical field, the censored data quantile regression model has replaced the Cox proportional hazard model and the accelerated failure time (AFT) model as the main method to study the survival analysis. In this paper, we consider the sparse high-dimensional censored quantile regression model with the regular term. Finally, it can be transformed into a linear combination of quantile regression models, and we can solve the quantile regression model of the censored data by solving the quantile regression model.
In this paper, we use smooth functions for the first time to solve the problem of censored quantile regression. First, we introduce the number of quantiles and censorship data as well as the relevant background knowledge and basic properties of high dimensional data in first, second and three chapters. Secondly, we enumerate two smooth functions, including the quantile Huber penalty function, to replace the quantiles. Since the Huber penalty function has the same optimal value as the quantile loss function, we use the Huber penalty function as the research object in the theoretical part of the article. We use the smooth function to make the objective function of the model a differentiable function. On the differentiable basis, on the basis of differentiable, we use the weighted L1 regular penalty term to design a weighted smooth iterative algorithm for the censored data quantile regression model, MIRL, to realize the variable selection problem in the censored quantile regression. So we can not only get the convergence of the algorithm, but also prove that the algorithm is convergent. The optimal solution of the model has a good statistical property, such as the asymptotic normal property, the Oracle property, and so on. In the numerical experiment part, we have done a full experiment - the random Gauss matrix experiment and the Toeplitz covariance matrix experiment. In the numerical experiment, the most obvious feature is that the FPR and the TPR are almost 0 and 1., respectively. It shows that our method can accurately select the effective variables, which shows that the model and algorithm have a good variable selection function. Not only the error of the experiment is very small, but also the good effect of variable selection is realized, which shows that our algorithm has a good effect.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:O212.1
【共引文献】
相关期刊论文 前10条
1 赵贻玖;王厚军;戴志坚;;基于隐马尔科夫树模型的小波域压缩采样信号重构方法[J];电子测量与仪器学报;2010年04期
2 焦李成;杨淑媛;刘芳;侯彪;;压缩感知回顾与展望[J];电子学报;2011年07期
3 刘哲;杨扬;;一种新的基于压缩感知理论的稀疏信号重构算法[J];光电子.激光;2011年02期
4 何宜宝;毕笃彦;马时平;鲁磊;岳耀帅;;用概率推导和加权迭代L1范数实现信号重构[J];光电子.激光;2012年03期
5 张晓伟;李明;左磊;;基于基追踪-Moore-Penrose逆矩阵算法的稀疏信号重构[J];电子与信息学报;2013年02期
6 刘福来;彭泸;汪晋宽;杜瑞燕;;基于加权L_1范数的CS-DOA算法[J];东北大学学报(自然科学版);2013年05期
7 程晓良;郑璇;韩渭敏;;求解欠定线性方程组稀疏解的算法[J];高校应用数学学报A辑;2013年02期
8 谭龙;何改云;潘静;庞彦伟;;基于近似零范数的稀疏核主成成分算法[J];电子测量技术;2013年09期
9 傅绪加;吴红光;;向量范数函数的单调递减性质[J];淮北师范大学学报(自然科学版);2013年04期
10 郝岩;许建楼;;迭代重加权的小波变分修复模型[J];电子与信息学报;2013年12期
相关博士学位论文 前10条
1 王树云;基于Bayes方法和图限制下正规化方法的变量选择问题及其在基因组数据中的应用[D];山东大学;2010年
2 刘吉英;压缩感知理论及在成像中的应用[D];国防科学技术大学;2010年
3 易学能;图像的稀疏字典及其应用[D];华中科技大学;2011年
4 黄安民;基于感知字典的稀疏重建算法研究[D];电子科技大学;2011年
5 王英楠;几类非对称矩阵锥分析[D];北京交通大学;2011年
6 陈旭阳;主动式探测系统高质量检测、成像与识别方法研究[D];西安电子科技大学;2011年
7 高磊;压缩感知理论在宽带成像雷达Chirp回波处理中的应用研究[D];国防科学技术大学;2011年
8 陈一平;图像增强及其在视觉跟踪中的应用[D];国防科学技术大学;2011年
9 谷小婧;基于图像分析的自然彩色夜视成像方法研究[D];东华大学;2011年
10 杨粤涛;基于非采样Contourlet变换的图像融合[D];中国科学院研究生院(长春光学精密机械与物理研究所);2012年
,本文编号:2107750
本文链接:https://www.wllwen.com/kejilunwen/yysx/2107750.html