混合潜变量模型的构建及其在基因关联分析中的应用
发布时间:2018-07-07 10:40
本文选题:混合潜变量模型 + 单核苷酸基因多态性(SNPs) ; 参考:《山西医科大学》2012年硕士论文
【摘要】:混合潜变量模型(Structural equation mixture modeling, SEMM)是一种用于处理同时包含分类潜变量和连续潜变量的数据而形成的理论体系。SEMM作为第二代结构方程模型,它综合了因子分析、潜在类别分析和潜在剖面分析的思想,形成了自身独特的优势,其目的是为潜变量的分析提供一种新的思路和方法。它的提出不仅弥补了结构方程模型仅能处理连续潜变量和潜在类别分析仅能处理分类潜变量的不足,,也为医学、社会、心理等领域的研究者面对复杂数据时提供了一种新的思路。混合潜变量的这些优点正是为了适应现代医学发展中不断出现的复杂数据而出现的一种新的统计方法。因此,在医学研究中引入SEMM具有重要的现实意义。 本文系统的介绍了混合潜变量模型的有关理论,包括子模型的相关理论知识以及混合潜变量模型的构建、参数估计及模型的评价。模型参数估计介绍了常规的最大似然估计法(ML)和迭代最大似然估计(EM),其中EM算法是一种求解参数似然估计的迭代算法,是一种非常流行的极大似然估计方法,常用于处理存在缺失情况的数据。模型的评价指标包括AIC(Akaike information criterion)评分、BIC(Bayesian information criterion)评分、CAIC(consistent Akaike information criterion)及ICL-BIC(integrated completed likelihoodcriterion with BIC)等。 在理论基础之上,本文分别对因子分析混合模型和结构方程混合模型两类模拟数据进行了分析说明。实例部分采用混合潜变量模型对实测SNPs数据进行了分析。实例数据是由GAW17提供的,包含697个个体的22条常染色体的上万个SNP和根据这些SNP所模拟的697个个体的性状特点(3个定量性状和1个定性性状)。本研究挑选了1号染色体上的4个SNPs和3个定量性状作为研究变量,分别进行潜在类别分析和混合结构方程模型分析。分析结果显示:根据4个SNPS数据,人群被分为3个潜在类别,各类别的概率分别为0.53,0.34,0.13。潜在类别1、2中Q的因子均值分别为-4.029和-2.052(潜在类别3的因子均值Q设为0)。可知潜在类别1、2因子均值均低于潜在类别3(P0.001)。 本文的讨论部分对本次研究的意义做了简单说明,并对结构方程混合模型的模型构建、参数估计、模型评价等各个环节进行了探讨,另外,还对本次研究的优缺点及未来展望进行了说明。
[Abstract]:The structural equation mixture modeling, model (SEMM) is a theoretical system for processing data containing both classified and continuous latent variables. SEMM is used as the second generation structural equation model, which synthesizes factor analysis. The idea of potential category analysis and potential profile analysis has formed its own unique advantages, and its purpose is to provide a new way of thinking and method for the analysis of latent variables. It not only makes up for the deficiency that the structural equation model can only deal with continuous latent variables and potential category analysis can only deal with classified latent variables, but also makes up for the medical, social, Researchers in the field of psychology and other fields provide a new way of thinking when faced with complex data. These advantages of mixed latent variables are a new statistical method to adapt to the complex data emerging in the development of modern medicine. Therefore, the introduction of SEMM in medical research has important practical significance. In this paper, the theory of mixed latent variable model is introduced systematically, including the theoretical knowledge of submodel, the construction of mixed latent variable model, the estimation of parameters and the evaluation of model. The conventional maximum likelihood estimation (ML) and iterative maximum likelihood estimation (EM) are introduced in this paper. The EM algorithm is an iterative algorithm for solving the parameter likelihood estimation and is a very popular maximum likelihood estimation method. Often used to process missing data. The evaluation indexes of the model include BIC (Bayesian information criterion) score, CAIC (consistent Akaike information criterion) and ICL-BIC (integrated completed likelihoodcriterion with BIC). On the basis of the theory, two kinds of simulation data, factor analysis mixed model and structure equation mixed model, are analyzed and explained in this paper. The practical SNPs data are analyzed by the mixed latent variable model. The case data are provided by GAW17, including thousands of SNPs of 22 autosomes of 697 individuals and 697 traits (3 quantitative traits and 1 qualitative trait) of 697 individuals simulated by these SNPs. In this study, four SNPs and three quantitative traits on chromosome 1 were selected as study variables for potential class analysis and mixed structural equation model analysis, respectively. The results showed that according to the four SNPS data, the population was divided into three potential categories, and the probability of each category was 0.53 / 0.34 / 0.13 respectively. The factor mean values of Q were -4.029 and -2.052 in potential class 1 / 2, respectively (Q = 0 for potential category 3). The mean value of 1 / 2 factor of potential category was lower than that of potential category 3 (P0.001). In the discussion part of this paper, the significance of this study is briefly explained, and the model construction, parameter estimation and model evaluation of the mixed structural equation model are discussed. The advantages and disadvantages of this study and its future prospects are also described.
【学位授予单位】:山西医科大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:R346
【参考文献】
相关期刊论文 前2条
1 王颖,褚迅,黄薇;单核苷酸多态性研究及其对人类医学的影响[J];基础医学与临床;2004年06期
2 朱文圣;郭建华;;基于单倍型的复杂疾病基因定位研究[J];数理统计与管理;2009年02期
相关硕士学位论文 前3条
1 张韶凯;基于贝叶斯网的潜类分析在基因关联分析中的应用[D];山西医科大学;2011年
2 连军艳;EM算法及其改进在混合模型参数估计中的应用研究[D];长安大学;2006年
3 裴磊磊;抑郁患者单核苷酸多态性(SNPs)分布特征的潜在类别分析[D];山西医科大学;2009年
本文编号:2104713
本文链接:https://www.wllwen.com/xiyixuelunwen/2104713.html
最近更新
教材专著