非正态验证性因子分析在基因整体效应中的应用
发布时间:2018-10-14 21:01
【摘要】:在后基因组时代,单核苷酸多态性(single-nucleotidepolymorphisms,SNPs)研究已成为生物医学研究的热点这是因为SNPs是最常见的人类序列变异,广泛分布在人类DNA中,且SNPs的检测已自动化目前,与SNPs相适应的统计学方法,已成为统计遗传学领域研究的热点有学者将潜在结构模型(latent structural model)或潜变量模型(latentvariable model)引入单体型或高维SNPs整体效应的关联分析及其相关的推断性研究但潜变量模型要求观测变量与潜变量服从正态分布,SNPs数据无论以何种遗传模式量化,都违背其正态假定为此,本文针对SNPs数据不服从正态分布的情况,拟采用S-B估计方法拟合验证性因子模型,进行SNPs整体效应和关联性分析 本文详细介绍了验证性因子模型的有关理论,包括模型概述模型参数估计模型拟合评价及模型修正的相关内容其中着重介绍模型参数估计的几种方法:最大似然估计Browne’s渐近任意分布方法S-B测度调整(scaled)估计并对几种方法方法做比较,得出S-B估计方法为最适合处理SNPs数据的参数估计方法 在此理论的基础上,用GAW17提供的SNPs数据进行实例分析本次研究随机选取2号染色体上,分布在6个基因之中的13个SNPs作为研究对象,结果显示:ML估计方法卡方自由度比2/ df=3.59,S-B调整估计方法卡方自由度=2.89,ML估计法RMSEA=0.061,S-B调整估计法RMSEA=0.052此结果表示使用S-B调整方法得到的拟合指标较ML法好,说明在处理SNPs数据时,使用S-B估计能得到更好的拟合模型此外,,由于6个基因之间的相关系数很大,所以将这6个基因作为初阶因子,做二阶验证性因子分析,能得到一个拟合很好简洁的二阶模型该实例通过GAW17提供的模拟数据,对选取的6个基因做潜变量得分,然后对基因和疾病感染做t检验,得出6个基因对该感染都有影响,可以推测这6个基因下的13个SNP位点可能是感染的致病位点对二阶因子和疾病感染做检验,得到基因A对感染有影响( t 3 .657, P 0.001) 本文的讨论部分简要介绍了本次研究的主要内容,并对ML参数估计与S-B调整估计方法高阶验证性因子模型与验证性因子模型分别做了比较,此外,在讨论部分,本次研究的优缺点及研究展望也作了阐述
[Abstract]:In the post-genome era, single nucleotide polymorphism (single-nucleotidepolymorphisms,SNPs) research has become a hot topic in biomedical research because SNPs is the most common human sequence variation, widely distributed in human DNA, and the detection of SNPs has been automated. Statistical methods adapted to SNPs, It has become a hot topic in the field of statistical genetics that scholars introduce latent structural model (latent structural model) or latent variable model (latentvariable model) into the correlation analysis of haplotype or high-dimensional SNPs global effect and its correlation inference, but the latent variable model The observed variables and latent variables should be applied to normal distribution, and the SNPs data should be quantified by any genetic model. In this paper, the S-B estimation method is used to fit the confirmatory factor model in view of the fact that the SNPs data are dissatisfied with the normal distribution. In this paper, the theory of confirmatory factor model is introduced in detail. Including model overview, model parameter estimation, model fitting evaluation and model modification. Several methods of model parameter estimation are emphatically introduced: maximum likelihood estimation (Browne's) asymptotic arbitrary distribution method S-B measure adjusted (scaled) estimation. And compare several methods, It is concluded that S-B estimation method is the most suitable parameter estimation method for processing SNPs data. On the basis of this theory, an example of SNPs data provided by GAW17 is used to analyze the random selection of chromosome 2 in this study. Thirteen SNPs distributed in 6 genes were used as research subjects. The results show that the chi-square degree of freedom of the ML estimation method is better than that of the 2 / df=3.59,S-B adjustment estimation method, the chi-square degree of freedom = 2.89 RMSEA=0.061,S-B adjustment estimation method RMSEA=0.052. The result shows that the fitting index obtained by using S-B adjustment method is better than that by ML method, which shows that when SNPs data are processed, A better fitting model can be obtained by using S-B estimation. In addition, because of the large correlation coefficient among the six genes, the six genes are used as the first order factor and the second order confirmatory factor analysis is done. We can get a second-order model that fits well and succinctly. The simulation data provided by GAW17 can be used to score the latent variables of the selected six genes, and then t test the gene and disease infection, and find that the six genes have an effect on the infection. It can be inferred that 13 SNP loci under these 6 genes may be the pathogenic sites of infection to test the second order factor and disease infection. The main contents of this study are briefly introduced in the discussion part of this paper. The gene A has an effect on infection (t 3.657, P 0.001). In addition, in the discussion part, the advantages and disadvantages of this study and the prospect of the research are also discussed and compared with the high-order confirmatory factor model and the confirmatory factor model of the ML parameter estimation and the S-B adjustment estimation method.
【学位授予单位】:山西医科大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:R346
本文编号:2271633
[Abstract]:In the post-genome era, single nucleotide polymorphism (single-nucleotidepolymorphisms,SNPs) research has become a hot topic in biomedical research because SNPs is the most common human sequence variation, widely distributed in human DNA, and the detection of SNPs has been automated. Statistical methods adapted to SNPs, It has become a hot topic in the field of statistical genetics that scholars introduce latent structural model (latent structural model) or latent variable model (latentvariable model) into the correlation analysis of haplotype or high-dimensional SNPs global effect and its correlation inference, but the latent variable model The observed variables and latent variables should be applied to normal distribution, and the SNPs data should be quantified by any genetic model. In this paper, the S-B estimation method is used to fit the confirmatory factor model in view of the fact that the SNPs data are dissatisfied with the normal distribution. In this paper, the theory of confirmatory factor model is introduced in detail. Including model overview, model parameter estimation, model fitting evaluation and model modification. Several methods of model parameter estimation are emphatically introduced: maximum likelihood estimation (Browne's) asymptotic arbitrary distribution method S-B measure adjusted (scaled) estimation. And compare several methods, It is concluded that S-B estimation method is the most suitable parameter estimation method for processing SNPs data. On the basis of this theory, an example of SNPs data provided by GAW17 is used to analyze the random selection of chromosome 2 in this study. Thirteen SNPs distributed in 6 genes were used as research subjects. The results show that the chi-square degree of freedom of the ML estimation method is better than that of the 2 / df=3.59,S-B adjustment estimation method, the chi-square degree of freedom = 2.89 RMSEA=0.061,S-B adjustment estimation method RMSEA=0.052. The result shows that the fitting index obtained by using S-B adjustment method is better than that by ML method, which shows that when SNPs data are processed, A better fitting model can be obtained by using S-B estimation. In addition, because of the large correlation coefficient among the six genes, the six genes are used as the first order factor and the second order confirmatory factor analysis is done. We can get a second-order model that fits well and succinctly. The simulation data provided by GAW17 can be used to score the latent variables of the selected six genes, and then t test the gene and disease infection, and find that the six genes have an effect on the infection. It can be inferred that 13 SNP loci under these 6 genes may be the pathogenic sites of infection to test the second order factor and disease infection. The main contents of this study are briefly introduced in the discussion part of this paper. The gene A has an effect on infection (t 3.657, P 0.001). In addition, in the discussion part, the advantages and disadvantages of this study and the prospect of the research are also discussed and compared with the high-order confirmatory factor model and the confirmatory factor model of the ML parameter estimation and the S-B adjustment estimation method.
【学位授予单位】:山西医科大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:R346
【参考文献】
相关期刊论文 前6条
1 张岩波;;基于潜变量构建高维单核苷酸多态性基因关联模型[J];郑州大学学报(医学版);2011年03期
2 王颖,褚迅,黄薇;单核苷酸多态性研究及其对人类医学的影响[J];基础医学与临床;2004年06期
3 孙连荣;结构方程模型(SEM)的原理及操作[J];宁波大学学报(教育科学版);2005年02期
4 李婧,潘玉春,李亦学,石铁流;人类基因组单核苷酸多态性和单体型的分析及应用[J];遗传学报;2005年08期
5 陈民志;程广辉;马龙;王慧;邱熔芳;薛付忠;刘奇迹;;TNFSF4基因与冠心病的关联研究[J];遗传;2011年03期
6 方敏;黄正峰;;结构方程模型下非正态数据的处理[J];中国卫生统计;2010年01期
相关博士学位论文 前1条
1 张岩波;医护人员职业紧张的统计建模研究[D];山西医科大学;2004年
相关硕士学位论文 前5条
1 张韶凯;基于贝叶斯网的潜类分析在基因关联分析中的应用[D];山西医科大学;2011年
2 徐秀娟;结构方程模型及其在医学研究中的应用[D];山西医科大学;2004年
3 沈亚;基于芯片方法的非综合征性唇腭裂与IRF6、TGFA的SNPs相关性分析[D];南京医科大学;2007年
4 郝彦斌;小样本非正态数据结构方程模型估计方法研究与医学应用[D];山西医科大学;2007年
5 裴磊磊;抑郁患者单核苷酸多态性(SNPs)分布特征的潜在类别分析[D];山西医科大学;2009年
本文编号:2271633
本文链接:https://www.wllwen.com/xiyixuelunwen/2271633.html
最近更新
教材专著