倾向评分法及其处理共线性数据的模拟研究
发布时间:2018-05-25 05:49
本文选题:倾向评分 + 多重共线性 ; 参考:《南通大学》2012年硕士论文
【摘要】:目的通过比较倾向评分(Propensity Score, PS)回归法与传统logistic回归法处理多重共线性资料结果的差异,探讨PS回归法处理多重共线性资料的统计性质及其应用特点。 方法采用蒙特卡罗(Monte Carlo, MC)模拟法,分别从样本量、协变量与暴露变量相关性以及结局变量阳性率三个因素的不同水平进行模拟研究,比较PS回归法与logistic回归法在处理多重共线性资料的差异,同时探讨了三个因素间的相互影响,并用实例对上述模拟结果予以验证,进一步阐明倾向评分回归法处理多重共线性资料的可行性和实用性。 结果(1)当固定结局变量阳性率(4%),协变量与暴露因素相关性较高(r=0.92)时,PS回归的回归系数较logistic回归更接近标准模型的估计值,但是,随着样本量的增加,回归系数的估计逐渐趋于一致,而且估计误差会越来越小。 (2)当样本量固定,PS回归计算的回归系数随着协变量与暴露因素相关性的变化与标准模型变化趋势一致,两模型回归系数之差并不随相关性的增加而变大,而logistic回归估计的回归系数以及标准误在一定相关性(1=1000,r0.5;n=500,r0.3)后就开始增大并远离标准值。同时,与一般1ogistic回归模型相比,PS回归法在样本量较小的资料中对共线性处理的优势更为明显。 (3)当样本量固定,协变量与暴露因素相关性较高(r=0.92)时,与logistic回归相比,PS回归的回归系数及标准误与标准模型较为接近,但这种优势随着阳性率的增加而逐步变小。 结论基于本研究的结果,我们认为在处理具有多重共线性的数据时,PS回归的参数估计较logistic回归的参数估计更为可靠,特别是在样本量小、结局变量阳性率较低、变量间共线性较高条件下更应考虑使用PS回归以避免参数估计的偏倚。
[Abstract]:Objective To investigate the statistical properties of multiple collinearity data and its application characteristics by comparing the results of multiple collinearity data with the traditional logistic regression method by comparing the Propensity Score ( PS ) regression method with the traditional logistic regression method .
Methods Monte Carlo ( MC ) simulation was used to study the correlation between the sample size , the association variables and the exposure variables and the positive rate of the outcome variables . The differences of the PS regression method and the logistic regression method in the treatment of multiple collinearity data were compared .
Results ( 1 ) When the positive rate of fixed outcome variables ( 4 % ) , the correlation between covariant and exposure factors is high ( r = 0.92 ) , the regression coefficient of PS regression is closer to the estimation value of the standard model than logistic regression , but with the increase of sample size , the estimation of regression coefficient tends to be consistent , and the estimation error becomes smaller and smaller .
( 2 ) When the sample size is fixed , the regression coefficient of PS regression is consistent with the changing trend of the standard model . The difference of the regression coefficients of the two models does not change with the increase of the correlation , but the regression coefficient and the standard error of the logistic regression estimate are increased and far away from the standard value after a certain correlation ( 1 = 1000 , r0.5 ; n = 500 , r0.3 ) .
( 3 ) Compared with logistic regression , the regression coefficient and standard error of PS regression were close to the standard model when the sample size was fixed and the correlation between covariant and exposure factor was high ( r = 0.92 ) , but this advantage gradually decreased with the increase of positive rate .
Conclusion Based on the results of this study , we believe that the parameter estimation of PS regression is more reliable than that of logistic regression when dealing with data with multiple collinearity , especially in small sample size , low positive rate of outcome variable , and higher inter - variable collinearity . The PS regression should be considered to avoid bias of parameter estimation .
【学位授予单位】:南通大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:R181.1
【共引文献】
相关博士学位论文 前10条
1 鲲鹏;地震灾后医疗救助评估与政策研究[D];华中科技大学;2010年
2 徐丽;基于历代医案数据库整理和临床问卷调查的月经量、色症状规范的研究[D];山东中医药大学;2010年
3 黄锐;基于利益相关者的公立医院组织绩效评价指标体系研究[D];华中科技大学;2011年
4 张海波;具象思维作业的脑电空间与频域特征研究[D];北京中医药大学;2011年
5 刘育辰;甘草质量评价多指标检测方法的建立及其在不同来源甘草药材鉴别上的应用[D];北京中医药大学;2011年
6 俞丽华;偏头痛基于患者报告的结局评价量表研制及性能初步考评[D];北京中医药大学;2011年
7 孙晶涛;基于内容的垃圾邮件过滤技术研究[D];兰州理工大学;2010年
8 徐文科;基于微分方程的生态数学模型统计分析[D];东北林业大学;2009年
9 亓兴兰;SPOT-5遥感影像马尾松毛虫害信息提取技术研究[D];福建农林大学;2011年
10 刘云霞;耐多药结核病影响因素的生态学研究[D];山东大学;2011年
,本文编号:1932374
本文链接:https://www.wllwen.com/yixuelunwen/liuxingb/1932374.html