基于零膨胀计数数据回归模型的贝叶斯分析

发布时间：2017-12-27 09:18

本文关键词：基于零膨胀计数数据回归模型的贝叶斯分析　出处：《昆明理工大学》2015年硕士论文　论文类型：学位论文

【摘要】：计数数据广泛的存在于生物医学、金融保险、公共健康以及风险控制等领域,零点膨胀是该数据所呈现出的特征之一。所谓零点膨胀,即零观测的比例远超过了拟合分布所允许的范围,也即在零处发生了膨胀。零点膨胀泊松回归模型是拟合上述数据的一般选择。此外,计数数据还常常会呈现出散度偏大的特征,若数据方差的变化大于其均值,则称该数据是散度偏大的。较传统的零点膨胀泊松回归模型而言,零点膨胀下的负二项(ZINB)回归模型更能够解释数据中散度偏大的结构,是分析散度偏大计数数据的有力工具。从已有的研究成果来看,现有的方法和理论大都集中于计数数据的似然分析方面,相比之下,对于现实生活中广泛存在的计数数据的贝叶斯分析仍存在较大的研究空间,特别是对散度偏大计数数据下的层次回归模型的贝叶斯统计推断研究仍有待进一步完善。与极大似然方法相比,贝叶斯方法综合了样本中的先验信息,对于某些分布的建模又具有较灵活的特点,特别是对于缺失数据与复杂模型的研究,贝叶斯方法尤其具有计算的可行性、有效性等方面的优势。因此,本论文将从贝叶斯分析的角度入手,对具有零点膨胀和散度偏大的计数数据进行深入研究。论文首先针对计数数据的零膨胀问题,建立与Probit模型相结合的零膨胀泊松回归模型,同时建立起了结合Gibbs抽样与M-H算法的MCMC技术以获得模型参数的贝叶斯估计,在此基础上,论文采用了DIC信息准则以进行模型之间的比较和选择并进一步考虑了偏后验预测p值以合理评估模型的拟合优度。此外,由于抽样程序及问卷设计的需要,计数数据往往会呈现出组内相关与组间独立的特征,经典的纵向计数数据分析理论总是对随机效应及随机误差均考虑正态分布的情形,然而在实际应用中,这样的假设缺乏统计上的稳健性与建模的灵活性,特别是对于具有尖峰厚尾以及非对称的“非正态型”数据而言,这样的假设会导致有偏甚至无效的统计推断结论。为此,本论文重点考虑了偏斜正态分析下的ZINB层次回归模型的贝叶斯分析问题。具体的,建立起了关于零点膨胀计数数据的ZINB层次回归模型并对随机误差及随机效应考虑偏斜正态分布,在贝叶斯后验推断方面,基于数据添加思想及偏斜正态分布的随机表示理论,建立起了三层次的贝叶斯分析模型并最终得到模型的后验分布。实际例子表明,该论文提出的方法是有效的。
[Abstract]:Counting data is widely distributed in biomedicine, finance and insurance, public health and risk control. Zero expansion is one of the characteristics of the data. The so-called zero point expansion, that is, the proportion of zero observation is far beyond the range allowed by the fitting distribution, that is, the expansion at zero. The zero inflated Poisson regression model is a general choice for fitting the above data. In addition, the number of data is often characterized by large divergence, and if the variance of the data is larger than the mean value, it is called a large divergence. Compared with the traditional zero inflated Poisson regression model, the negative two item (ZINB) regression model with zero expansion can explain the larger dispersion structure in data, and it is a powerful tool to analyze the scattered count data. From the existing research results, and likelihood method and the existing theory focused on the analysis of count data compared to the larger research space still exists in Bias analysis of count data exists for in real life, especially the further improvement of partial plans according to the estimation of divergence count Bias statistical regression model under the level still. Compared with the maximum likelihood method and Bayesian method is integrated in the sample prior information, and has the characteristics of flexible modeling for some distribution, especially for the lack of research data and complex models, especially has the advantages of Bayesian method calculation is feasible and effective. Therefore, from the point of view of Bias analysis, this paper will make a thorough study of the counting data with zero point expansion and greater divergence. Firstly, according to the problem of zero inflated count data, a combination of Probit model and zero inflated Poisson regression model, and established the Bayesian estimation to obtain model parameters with Gibbs sampling and M-H algorithm of MCMC technology, on this basis, this paper uses the DIC information criterion to compare and choose between models and the partial posterior predictive value of P to evaluate the goodness of fit for the model. In addition, due to the sampling procedures and the design of the questionnaire, count data often shows characteristics of group related groups of independent, longitudinal count data analysis theory of the classic is always on the random effects and random error are considered normal distributions, but in practical application, the assumption of lack of robustness and statistical modeling the flexibility, especially for leptokurtic and non symmetric "non normal" data, this assumption will lead to biased statistical inference even invalid conclusion. To this end, this paper focuses on the Bias analysis of the ZINB hierarchical regression model under skew normal analysis. Specifically, to establish a zero inflated count data ZINB level regression model and the random error and the random effects considering skewed normal distribution in Bias, posterior inference, data adding method and skew normal distribution stochastic representation theory based on established Bias analysis model of three levels and get the final model posterior distribution. The practical example shows that the method proposed in this paper is effective.
【学位授予单位】：昆明理工大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：O212.8

【参考文献】