广义线性模型的稳健估计及其医学应用

发布时间：2018-04-30 11:40

本文选题：广义线性模型 + 稳健估计　；参考：《山西医科大学》2009年硕士论文

【摘要】： 广义线性模型(generalized linear model,GLM)是一类应用范围较广的模型,它可以满足应变量为连续和离散数据的建模,特别是后者,如属性数据,计数数据。这在应用上,尤其是生物、医学、经济和社会数据的统计分析上,有着重要意义。但是其经典模型拟合方法最大似然估计(MLE)容易受离群点的影响,甚至得出错误结论。因此,研究能有效对抗离群点的稳健估计方法将具有重要意义。本文回顾和比较了四种适用于广义线性模型的稳健估计方法:Mallows拟似然估计、条件无偏影响约束估计(CUBIF)、Mallows降权杠杆点估计和一致性错分模型估计。首先在稳健回归估计基本理论的基础上对这四种估计方法的基本思想和稳健性质进行了详细的阐述。其中后两种方法只能适用于Logistic回归模型。在模拟分析中,对Mallows拟似然估计考虑了帽矩阵、MVE和MCD三种针对x方向降权的尺度,对Mallows降权杠杆点估计考虑Carroll和Huber两种降权函数。模拟分析基于两种常见的广义线性模型即Logistic回归和Poisson回归进行了设计,然后对每种模型建立的模拟样本中分别构建y方向、x和y方向两种不同类型和不同比例的离群点情况,探讨了适用于各自模型的各种估计方法对抗不同类型和比例离群点的能力。通过模拟研究我们得到以下结论: 1.相比较于经典的MLE,这一类稳健估计方法在一定程度上可以更好的对抗离群值产生的影响,描述最佳拟合大部分数据的结构;可以更清楚地识别离群值、模型中的强影响点与模型偏离的结构;当数据中没有影响点时,其估计与经典MLE估计一样好,但是当MLE条件不满足时,稳健估计结果要远远优于MLE。 2.在Logistic回归模型和Poisson回归模型情况下,Mallows拟似然估计基于MVE和MCD的降权方法都表现了较其他估计方法更强的对抗离群点的能力。而基于帽矩阵的降权方法则由于帽矩阵本身的不稳健性导致了其较低的失效点。 3.Mallows降权杠杆点估计方法由于其权函数是基于x方向离群点,所以在单纯的1%的y方向的离群点时即失去效用,但是在x和y方向同时异常时有很好的对抗性离群点的能力,不过由于其权函数对x方向离群点观测赋权重为0达到规避离群观测的特性,在离群点比例增大时,极容易导致logistic回归模型完美分割导致估计无解情况的发生,而且其降权过程会损失样本的大量信息。 4.一致性错分模型估计表现要差于前两种方法,但相对MLE来说具有较好的稳健性,不过其缺点在于可能造成正常观测点的强制降权作用。 5.CUBIF本身思想为影响约束估计,可以同时考虑x和y方向的异常情况,不过其表现要劣于其他稳健估计方法。最后本文通过两个实例,探讨了这些方法的实际应用。
[Abstract]:Generalized linear model (GLM) is a class of models with a wide range of applications, which can satisfy the modeling of continuous and discrete variables, especially the latter, such as attribute data and counting data. This is of great significance in the application, especially in the statistical analysis of biological, medical, economic and social data. However, the maximum likelihood estimation (MLEs) of the classical model fitting method is easily affected by outliers, and even the wrong conclusions are obtained. Therefore, it is of great significance to study robust estimation methods for outliers. In this paper, we review and compare four methods of robust estimation for generalized linear models: the * Mallows quasi-likelihood estimator, the conditional unbiased influence constraint estimation, the weighted leverage point estimation and the consistent misdivision model estimation. On the basis of the basic theory of robust regression estimation, the basic ideas and robust properties of these four estimation methods are described in detail. The latter two methods can only be applied to Logistic regression model. In the simulation analysis, the Mallows quasi-likelihood estimation takes into account three kinds of scales for the reduction of weights in the x direction of the cap matrix MVE and MCD, and Carroll and Huber for the Mallows weight reduction lever point estimation. The simulation analysis is based on two common generalized linear models, namely, Logistic regression and Poisson regression. Then, the outliers of y direction x and y direction are constructed in the simulated samples of each model. The ability of various estimation methods suitable for each model to deal with different types and proportions of outliers is discussed. Through the simulation study, we get the following conclusions: 1. Compared with the classical MLEs, this kind of robust estimation method can better resist the influence of outliers to some extent, describe the structure of the best fitting most data, and identify outliers more clearly. When there is no influence point in the data, the estimation is as good as the classical MLE estimation, but when the MLE condition is not satisfied, the robust estimation result is much better than that of the MLE. 2. In the case of Logistic regression model and Poisson regression model, the weight reduction methods based on MVE and MCD both show stronger ability to resist outliers than other estimation methods. The weight reduction method based on hat matrix leads to lower failure point due to the unrobustness of cap matrix itself. Because the weight function of 3.Mallows 's weight reduction lever point estimation method is based on the outliers in the x direction, it loses its effectiveness when the outlier is only 1% in the y direction, but it has a good ability to resist outliers when the x and y directions are abnormal at the same time. However, due to the fact that its weight function gives a weight of 0 to the observation of outliers in the x direction to avoid outliers, when the proportion of outliers increases, it is easy to lead to the perfect segmentation of the logistic regression model and the occurrence of the estimation without solution. And the process of weight reduction will lose a lot of information of the sample. 4. The estimation performance of consistent misdivision model is worse than that of the first two methods, but it has better robustness than MLE, but its disadvantage is that it may result in the forced weight reduction of normal observation points. The idea of 5.CUBIF itself is to influence the constrained estimation, so we can consider the anomaly in the direction of x and y at the same time, but its performance is inferior to that of other robust estimation methods. Finally, this paper discusses the practical application of these methods through two examples.
【学位授予单位】：山西医科大学
【学位级别】：硕士
【学位授予年份】：2009
【分类号】：R311

【参考文献】