当前位置:主页 > 医学论文 > 药学论文 >

广义线性混合模型在二分类纵向数据中的探索研究

发布时间:2018-09-03 07:20
【摘要】:二分类纵向数据广泛应用于医学、心理学、社会科学等领域,在新药临床研究中更为常见。由于该类数据不服从正态分布以及其不同时间点的数据间具有相关性,因而不满足传统的统计研究方法的应用条件。目前,能够分析研究二分类纵向数据的统计方法有广义估计方程和广义线性混合模型等。其中,文献中广义估计方程的研究已经比较成熟,但是,关于广义线性混合模型的研究比较少,特别是关于参数估计方法的研究还很有限且不完整。目的:通过蒙特卡罗模拟比较广义线性混合模型在二分类纵向数据分析中各种参数估计方法的优劣,以及研究样本量大小、协方差结构以及数据缺失状况对各种参数估计方法的影响。方法:根据蒙特卡罗模拟研究方法,采用如下评价指标比较各种参数估计方法的优劣:偏差,均方误,平均均方误(针对各时间点)和最大均方误(针对各时间点),以及95%可信区间覆盖率;考虑不同样本量,不同协方差结构,以及不同缺失机制和缺失比例对二分类纵向数据分析中各种参数估计方法上述指标的影响;并且将研究的结果应用于一个临床试验二分类纵向数据的分析。结果:在样本量较大且没有缺失数据的情况下,数值积分近似法在分析不管协方差结构为复合对称型还是不确定型的二分类纵向数据时,获得的估计量偏差更小,95%可信区间的覆盖率更高,而且其估计量的均方误、平均均方误和最大均方误也更低,也就是说,数值积分近似法分析较高样本量的二分类纵向数据更准确更稳定。数值积分近似法的优势在随机效应方差较大(大于等于1)的情况下更为明显,但在随机效应方差小于1的情况下,数值积分近似法和线性化方法分析二分类纵向数据得到的估计量偏差十分接近,95%可信区间的覆盖率也大致相同。大样本量的结论并不适用小样本量的情况。在分析小样本量的二分类纵向数据时,线性化方法中的RSPL和MSPL方法更稳定,RMPL和MMPL方法获得的95%可信区间的覆盖率更高,而且线性化方法对于随机变量的参数估计更准确。说明在分析低样本量的二分类纵向数据时,线性化方法更有优势。广义线性混合模型中不同参数估计方法在协方差结构为不确定时的稳健性在不同样本量的情况下表现也是不同的。在样本量较低的情况下,线性化方法中的RSPL和MSPL方法产生的均方误,平均均方误和最大均方误与其他方法相比更小。从产生的G矩阵的正定比例来看,RSPL和MSPL方法也更好。所以在样本量较低的情况下,RSPL和MSPL方这两种方法在分析低样本量的二分类纵向数据时在协方差结构为不确定时稳健性更好。而在样本量较大的情况下,数值积分近似法更好,产生的估计值偏差较小,95%可信区间的覆盖率也更高。与此同时,从收敛情况来看,数值积分法也有不可替代的优势。因此,数值积分近似法这两种方法在分析大样本量的二分类纵向数据时在协方差结构为不确定时稳健性更好。当数据中含有缺失的情况下,不论缺失机制为完全随机缺失还是随机数据缺失,在缺失比例较小时,数值积分近似法分析二分类纵向数据得到的参数估计偏差相对更小,95%可信区间的覆盖率更高,稳定性也更好。在缺失比例较高时,数值积分近似法反而不如线性化方法中的RSPL和MSPL方法分析数据得到的估计量偏差小,而且线性化法得到的95%可信区间的覆盖率也更高,分析数据获得的估计量也更稳定。在实例分析中,由于样本量较大,缺失数据比率很低,数值积分近似法是应该选择的参数估计方法。各种数值积分法所得到的两组的对数差异比及其95%可信区间并没有明显的差别。结论:应用广义线性混合模型分析二分类纵向数据要根据数据的样本量,协方差结构和数据缺失情况选择参数估计方法。当数据中没有缺失或者缺失比例较低时,数值积分近似法对大样本量和较大随机效应方差的数据分析有优势,而对于当样本量较小时,线性化法分析则更好。在缺失比例较高时,采用线性化中的RSPL和MSPL方法来分析二分类纵向数据,相对于数值积分近似法更准确稳定。
[Abstract]:Bivariate longitudinal data are widely used in medicine, psychology, Social Sciences and other fields. It is more common in clinical research of new drugs. Because this kind of data does not obey normal distribution and the data of different time points have correlation, it does not meet the application conditions of traditional statistical research methods. Statistical methods for data processing include generalized estimator equations and generalized linear mixed models. In the literature, the study of generalized estimator equations is more mature, but the study of generalized linear mixed models is less, especially the study of parameter estimation methods is still limited and incomplete. The advantages and disadvantages of various parameter estimation methods in binary longitudinal data analysis of generalized linear mixed model and the effects of sample size, covariance structure and data missing on various parameter estimation methods were studied. Advantages and disadvantages: bias, mean square error, mean square error (for each time point) and maximum mean square error (for each time point), and 95% confidence interval coverage; considering the impact of different sample size, different covariance structure, and different missing mechanism and missing ratio on various parameter estimation methods in binary longitudinal data analysis Results: In the case of large sample size and no missing data, the numerical integral approximation method has a smaller deviation of 95% in the analysis of binary longitudinal data, regardless of whether the covariance structure is composite symmetric or uncertain. The coverage rate of the confidence interval is higher, and the mean square error, mean square error and maximum mean square error of the estimator are lower. That is to say, the numerical integration approximation method is more accurate and stable in analyzing the binary longitudinal data with higher sample size. However, when the variance of random effects is less than 1, the estimator deviations of the numerical integration approximation method and the linearization method are very close to each other, and the coverage of 95% confidence intervals are approximately the same. The RSPL and MSPL methods are more stable, the coverage rate of 95% confidence intervals obtained by RMPL and MMPL methods is higher, and the linearization method is more accurate for parameter estimation of random variables. The robustness of the proposed method varies with different sample sizes when the covariance structure is uncertain. In the case of lower sample sizes, the mean square error, mean square error and maximum mean square error produced by RSPL and MSPL methods in linearization methods are smaller than those by other methods. The RSPL and MSPL methods are more robust when the covariance structure is uncertain when the sample size is low, and the numerical integration approximation method is better when the sample size is large, and the estimated value deviation is small and 95% confidence interval is small. At the same time, the numerical integration method has an irreplaceable advantage in terms of convergence. Therefore, the numerical integration approximation method is more robust when the covariance structure is uncertain in the analysis of large sample size binary longitudinal data. When the missing ratio is small, the numerical integration approximation method gets smaller deviation of parameter estimation, higher coverage rate of 95% confidence interval and better stability. The estimator deviation is small, and the coverage of 95% confidence interval obtained by linearization method is higher, and the estimator obtained by analysis data is more stable. There is no significant difference between the two groups in logarithmic difference ratio and 95% confidence interval. Conclusion: Generalized linear mixed model is used to analyze the two classifications of longitudinal data. The approximation method is superior to the linear method in the analysis of large sample size and large random effect variance, but it is better for the linear method when the sample size is small.
【学位授予单位】:复旦大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:R96;O242.2

【参考文献】

相关期刊论文 前7条

1 尹文娇;赵守军;张勇;;广义线性混合模型在传染病流行病学研究中的应用[J];中国疫苗和免疫;2011年04期

2 罗天娥;赵晋芳;刘桂芬;;GENMOD过程和GLIMMIX过程的比较[J];中国卫生统计;2010年02期

3 康萌萌;;基于广义线性混合模型的经验费率厘定[J];统计与信息论坛;2009年07期

4 罗天娥;刘桂芬;孟海英;;广义线性混合效应模型在临床疗效评价中的应用[J];数理医药学杂志;2007年05期

5 刘晓光;张岩;白艳春;燕春山;李吉娜;;类风湿性关节炎的治疗和护理体会[J];现代医药卫生;2007年11期

6 殷宗俊;张勤;;利用GLMM方法估计家畜阈性状的遗传力[J];中国农业大学学报;2005年06期

7 陈峰,任仕泉,陆守曾;非独立试验的组内相关与广义估计方程[J];南通医学院学报;1999年04期



本文编号:2219307

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/yiyaoxuelunwen/2219307.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户6da47***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com