单纯形分布模型变量选择及其应用研究
发布时间:2018-11-18 12:22
【摘要】:Barndorff-Nielsen和Jorgensen(1991)提出单纯形分布(simplexdistribution),它是取值在单位区间(0,1)上的连续分布,适合用来分析比例数据。在经济学研究中,经常需要分析服从单纯形分布的数据,诸如基尼系数、恩格尔系数等,都是取值在(0,1)上的随机变量。变量选择作为统计建模中很重要的问题之一,其方法研究一直是统计学界的热点课题。Tibshirani(1996)提出的Lasso方法,可以同时进行变量选择和参数估计,很好地克服了传统方法的一些不足,为模型选择这一领域注入了新活力,尤其是计算Lasso的有效算法LARS(Efron,2004)的提出,使Lasso方法广泛流行起来。本文针对单纯形分布模型,用LARS-Lasso方法探讨了它的变量选择问题,并应用于中国基尼系数的影响因素分析。本文主要做了如下三个方面的工作: 1.将LARS-Lasso方法应用于单纯形分布模型的变量选择,得到了模型的Lasso估计。从单纯形分布模型的极大似然函数出发,引入Lasso惩罚项,应用牛顿迭代算法进行局部二次逼近将ML转换LS类型,从而实现LARS算法的有效估计。 2.模拟研究。利用R软件编程,对LARS-Lasso方法和逐步回归进行效果比较,验证了基于LARS-Lasso的单纯形分布模型变量选择的可行性、有效性。 3.实例分析。首先对世界银行网站公布的世界各国历年部分基尼系数(样本量为644)进行探索性分析,,利用R绘制频率直方图、核密度估计曲线、正态分布拟合曲线和单纯形分布拟合曲线,进而指出比例数据近似服从单纯形分布的合理性,然后,将上述方法应用于中国基尼系数的影响因素分析,得出城乡居民收入差距、社会保障支出等因素是影响基尼系数的主要变量,取得了很好的分析效果。 综上所述,本文较为系统地研究了Lasso方法在单纯形分布模型变量选择中的应用,推广和发展了Tibshirani(1996)和Barndorff-Nielsen和Jorgensen(1991)等人的工作,模拟研究与实例分析表明了本文提出的方法简洁有效,特别是对比例数据的分析研究具有很好的应用价值。
[Abstract]:Barndorff-Nielsen and Jorgensen (1991) proposed simplex distribution (simplexdistribution), which is a continuous distribution with values in unit interval (0 ~ 1), which is suitable for analyzing proportional data. In the study of economics, it is often necessary to analyze the data from simplex distribution, such as Gini coefficient, Engel coefficient, etc. Variable selection is one of the most important problems in statistical modeling. The method of variable selection is always a hot topic in the field of statistics,. Tibshirani (1996). It can be used to select variables and estimate parameters at the same time. It overcomes some shortcomings of traditional methods and injects new vitality into the field of model selection, especially the effective algorithm LARS (Efron,2004) for computing Lasso, which makes Lasso method popular widely. In this paper, the problem of variable selection for simplex distribution model is discussed by LARS-Lasso method, and applied to the analysis of the influencing factors of Gini coefficient in China. The main work of this paper is as follows: 1. The LARS-Lasso method is applied to the variable selection of simplex distribution model and the Lasso estimation of the model is obtained. Based on the maximum likelihood function of simplex distribution model, the Lasso penalty term is introduced and the ML is transformed into LS type by local quadratic approximation using Newton iterative algorithm, thus the effective estimation of LARS algorithm is realized. 2. Simulation study. The results of LARS-Lasso method and stepwise regression are compared by using R software, and the feasibility and validity of variable selection of simplex distribution model based on LARS-Lasso are verified. 3. Case analysis First of all, some Gini coefficients (sample size 644) published on the website of the World Bank are analyzed, and the frequency histogram, kernel density estimation curve, normal distribution fitting curve and simplex distribution fitting curve are drawn by using R. Then it points out that the proportional data is reasonable from simplex distribution, and then applies the above method to the analysis of the influencing factors of Gini coefficient in China, and obtains the income gap between urban and rural residents. Social security expenditure and other factors are the main variables affecting Gini coefficient, and good results have been obtained. To sum up, this paper systematically studies the application of Lasso method in the selection of simplex distribution model variables, and generalizes and develops the work of Tibshirani (1996), Barndorff-Nielsen and Jorgensen (1991), etc. Simulation research and example analysis show that the method proposed in this paper is simple and effective, especially the analysis of proportional data has a good application value.
【学位授予单位】:贵州财经大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:F224;F124.7
[Abstract]:Barndorff-Nielsen and Jorgensen (1991) proposed simplex distribution (simplexdistribution), which is a continuous distribution with values in unit interval (0 ~ 1), which is suitable for analyzing proportional data. In the study of economics, it is often necessary to analyze the data from simplex distribution, such as Gini coefficient, Engel coefficient, etc. Variable selection is one of the most important problems in statistical modeling. The method of variable selection is always a hot topic in the field of statistics,. Tibshirani (1996). It can be used to select variables and estimate parameters at the same time. It overcomes some shortcomings of traditional methods and injects new vitality into the field of model selection, especially the effective algorithm LARS (Efron,2004) for computing Lasso, which makes Lasso method popular widely. In this paper, the problem of variable selection for simplex distribution model is discussed by LARS-Lasso method, and applied to the analysis of the influencing factors of Gini coefficient in China. The main work of this paper is as follows: 1. The LARS-Lasso method is applied to the variable selection of simplex distribution model and the Lasso estimation of the model is obtained. Based on the maximum likelihood function of simplex distribution model, the Lasso penalty term is introduced and the ML is transformed into LS type by local quadratic approximation using Newton iterative algorithm, thus the effective estimation of LARS algorithm is realized. 2. Simulation study. The results of LARS-Lasso method and stepwise regression are compared by using R software, and the feasibility and validity of variable selection of simplex distribution model based on LARS-Lasso are verified. 3. Case analysis First of all, some Gini coefficients (sample size 644) published on the website of the World Bank are analyzed, and the frequency histogram, kernel density estimation curve, normal distribution fitting curve and simplex distribution fitting curve are drawn by using R. Then it points out that the proportional data is reasonable from simplex distribution, and then applies the above method to the analysis of the influencing factors of Gini coefficient in China, and obtains the income gap between urban and rural residents. Social security expenditure and other factors are the main variables affecting Gini coefficient, and good results have been obtained. To sum up, this paper systematically studies the application of Lasso method in the selection of simplex distribution model variables, and generalizes and develops the work of Tibshirani (1996), Barndorff-Nielsen and Jorgensen (1991), etc. Simulation research and example analysis show that the method proposed in this paper is simple and effective, especially the analysis of proportional data has a good application value.
【学位授予单位】:贵州财经大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:F224;F124.7
【参考文献】
相关期刊论文 前10条
1 陈建宁;;社会保障对收入差距调节的困境及对策[J];保险研究;2010年12期
2 黄涛;胡宜国;胡宜朝;;地区人均GDP分布的基尼系数分析[J];管理世界;2006年05期
3 陈宗胜;中国居民收入分配差别的深入研究——评《中国居民收入分配再研究》[J];经济研究;2000年07期
4 王小鲁,樊纲;中国收入差距的走势和影响因素分析[J];经济研究;2005年10期
5 程永宏;;二元经济中城乡混合基尼系数的计算与分解[J];经济研究;2006年01期
6 曲兆鹏;赵忠;;老龄化对我国农村消费和收入不平等的影响[J];经济研究;2008年12期
7 解锋昌;李勇;;单形分布变离差检验的Score统计量的局部影响[J];南京理工大学学报(自然科学版);2005年06期
8 刘睿智;杜n
本文编号:2340029
本文链接:https://www.wllwen.com/jingjilunwen/zhongguojingjilunwen/2340029.html