基于分布加权思想的充分降维方法的影响分析
发布时间:2018-07-04 08:15
本文选题:充分降维 + 分布加权估计 ; 参考:《云南财经大学》2014年硕士论文
【摘要】:怎样推断在给定某些随机变量(自变量)时另一个变量(响应变量)的分布是统计中的重要问题。当自变量的个数很大时,用响应变量直接对自变量进行拟合,将很可能面临“维数诅咒”。在许多场合下,因变量其实仅仅与原自变量的几个线性组合相关联,换言之,若给定这些线性组合的值,则因变量将与所有自变量独立。若能找出这些原自变量的线性组合,以因变量对这些组合进行回归,,则高维自变量引起的问题即可得到解决。充分降维的任务正是在不预先假定参数模型的前提下,寻找这些原自变量的线性组合。近年来,由于各学科数据维度和规模日渐增长,降维问题广受瞩目。不依赖模型假设的充分降维问题已成为统计学界关注的热点。由于充分降维(SDR)是高维非参数回归问题的一个重要阶段,其结果是进一步研究回归的基础,故其稳健性在建模过程中尤其重要,因而研究充分降维方法的影响分析非常必要。影响分析理论是统计诊断理论的一个重要组成部分,该理论主要关注统计推断结果对模型初始设定的敏感程度。充分降维理论中的影响分析探索充分降维方法的稳健性,即研究模型中的某些方面(比如:某些数据点)是否对这些降维方法的结果有超出平均水平甚多的影响。某种意义上,影响分析是在评价降维的结果是否可以信赖。然而,由于充分降维理论中的统计推断结果是向量空间,现有的影响分析方法不适用于充分降维理论。本文在单指标模型下,研究了分布加权偏最小二乘估计的影响分析,在多指标模型下,研究累计切片估计降维方法的影响分析,通过数据删除方法和局部影响分析分析方法,解决了强影响点,特别是特殊强影响模式(如:掩盖效应)的探测问题。研究的主要成果有:1、在分布加权偏最小二乘估计和累计切片估计的影响分析中,引入Hooper (1959)提出的正则迹相关系数构造了一个空间位移函数,用于度量扰动前后充分降维空间估计之间的差异。该差异度量对于空间基向量的选取具有不变性,且充分考虑了自变量的协方差结构和降维空间的统计意义。2、依托上述空间位移函数,提出了一个拟曲率的概念,用于度量扰动对降维空间估计的局部影响,并给出了求取使得拟曲率达到最大的扰动方向的方法。该最大扰动方向经过标准化后即可视为影响评价统计量。上述研究成果是对Cook(1986)提出的基于似然位移函数的正则曲率方法的一种推广。数据模拟结果显示,我们提出的方法对于强影响点的探测效果比较理想。
[Abstract]:How to infer the distribution of another variable (response variable) when some random variables are given is an important problem in statistics. When the number of independent variables is very large, the response variables will be directly fitted to the independent variables, which will likely face the "curse of dimension". In many cases, the dependent variables are only associated with several linear combinations of the original independent variables. In other words, given the values of these linear combinations, the dependent variables will be independent of all the independent variables. If we can find out the linear combination of these original independent variables and regress these combinations with dependent variables, the problems caused by high-dimensional independent variables can be solved. The task of fully reducing dimension is to find the linear combination of these primitive variables without presupposing the parameter model. In recent years, dimensionality reduction has attracted much attention due to the increasing data dimension and scale of various disciplines. The sufficient dimensionality reduction problem which does not depend on the model hypothesis has become a hot topic in the field of statistics. Since full dimensionality reduction (SDR) is an important stage of high dimensional nonparametric regression problem and its result is the basis of further study of regression, its robustness is especially important in the modeling process, so it is very necessary to study the influence analysis of full dimensionality reduction method. The influence analysis theory is an important part of the statistical diagnosis theory, which focuses on the sensitivity of the statistical inference results to the initial setting of the model. The effect Analysis in the Theory of sufficient Dimension reduction the robustness of the sufficient dimension reduction method is explored, that is to say, whether some aspects of the model (for example, some data points) have more influence than the average level on the results of these methods. In a sense, impact analysis is an assessment of whether the results of dimensionality reduction are reliable. However, due to the fact that the result of statistical inference in the theory of sufficient dimensionality reduction is vector space, the existing methods of influence analysis are not suitable for the theory of sufficient dimension reduction. In this paper, the influence analysis of distributed weighted partial least square estimation is studied under the single index model. Under the multi-index model, the influence analysis of the dimension reduction method of cumulative slice estimation is studied, and the data deletion method and the local impact analysis method are used. The problem of detection of strong influence points, especially special strong influence modes (such as masking effect) is solved. The main results of this study are: 1. In the influence analysis of distributed weighted partial least squares estimation and cumulative slice estimation, a spatial displacement function is constructed by introducing the regular trace correlation coefficient proposed by Hooper (1959). It is used to measure the difference between sufficient dimensionality estimation before and after disturbance. The difference measure is invariant for the selection of spatial basis vectors, and fully considers the covariance structure of independent variables and the statistical significance of dimensionally reduced space. Based on the above spatial displacement function, a concept of quasi curvature is proposed. The method is used to measure the local influence of perturbation on the estimation of dimensionally reduced space, and a method is given to obtain the maximum perturbation direction of quasi curvature. The maximum disturbance direction can be regarded as an impact evaluation statistic after standardization. The above results are a generalization of the canonical curvature method based on likelihood displacement function proposed by Cook (1986). The simulation results show that the proposed method is effective for the detection of strong influence points.
【学位授予单位】:云南财经大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:C81
【参考文献】
相关期刊论文 前6条
1 石磊;陈飞;;具有一般协方差结构线性模型的局部影响评价[J];数学物理学报;2007年01期
2 何利平;石磊;;列联表数据的局部影响分析[J];数学物理学报;2011年02期
3 林路;数据删除模型和均值漂移模型对岭估计的影响[J];邵阳师专学报;1994年02期
4 赵喜仓;渠田田;许鲜欣;;数据删除模型在GDP诊断中的应用[J];统计与决策;2011年10期
5 朱宁;黄黎平;李绍波;李兵;;数据删除模型下的高杠杆点度量[J];统计与决策;2012年05期
6 解锋昌;韦博成;;多元t分布数据的局部影响分析[J];应用概率统计;2006年02期
本文编号:2095422
本文链接:https://www.wllwen.com/shekelunwen/shgj/2095422.html
教材专著