复杂性状与基因组多位点的关联分析方法研究
发布时间:2018-09-02 10:05
【摘要】: 影响复杂性状的遗传结构包括很多基因,这些基因的顺式及反式作用位置中的多个突变位点能够相互互作共同影响复杂性状。因此,多个位点的联合分析比单个位点的分析能获得更多的信息。单倍体型,即位于一条染色体上或某一区域内的一组相关联的SNP位点。显然,对单倍体型进行分析为运用SNP信息探究遗传性状尤其是复杂性状的遗传机制提供了一条更加便捷、更加有效的途径。因此本文研究单个基因的SNP互作的同时,利用单倍体型从基因组水平考虑非连锁区域基因与基因的共同作用。主要研究内容及其结果如下: 第一,基于半参数回归模型的单区域多位点与复杂性状关联分析探讨复杂性状与某一 区域功能位点的关联定位备受关注。目前通常用的方法是单独分析每一个区域的单个SNP,但这样可能会由于位点SNP与性状标记之间存在不完全的连锁不平衡而导致分析结果效力降低。也有通过单倍体型等方法联合分析这些位点,但如果有许多单倍体型时,可能会使单倍体型的分析方法的效能减弱。因此,我们基于Kwee等针对数量性状基因座的半参数回归模型,其模型能够同时利用多个SNPs的信息且能考虑位点之间的连锁不平衡,但相比现有多个位点联合分析方法,维度能更低。针对Kwee等的模型缺陷,我们引入对缺失数据的处理。另外,通过逐步向下P值法筛选与数量性状关联的多个联合SNP标记。人的前列腺癌是个普发的疾病,威胁着很多人的生命,备受世界关注。本节我们利用HapMap的公共数据,对人前列腺癌通路上具有淋巴细胞表达数据的67个基因与339个候选基因进行顺式与反式调控的分析,找出影响人前列腺癌通路上基因表达的顺式和反式作用因子,并进行通路分析。 第二,基于参数回归模型的多区域单倍体型与复杂性状关联分析复杂性状的遗传基础包 括很多基因,这些基因的联合作用是很普遍的。因此同时考虑多个基因、多个区域是比较可取的。我们提出基于广义线性回归模型复杂性状跟多个非连锁区域的单倍体型联合的关联分析,通过打分统计来检验单倍体型效应的零假设。另外,通过多重检验最小P值法获得最好非连锁区域多个位点的联合。通过模拟研究检验我们提出的方法的准确性和检出效能,证实模型的有效性和对单倍体型互作的关联分析的检出率。对于没有考虑其他协变量的数据,通过跟软件FAMHAP的htr和hapcc模型比较得出,我们的方法在准确性和有效检出率能与htr和hapcc相当,甚至超过它们。另外,我们模型能考虑更多的性状类型以及允许加入其他协变量。为了验证我们方法的有效性,我们应用在有4个非连锁侯选基因与猪肉质的关联分析中。 第三,基于半参数回归模型的多区域单倍体型与复杂性状关联分析一般复杂性状的遗传 模式都包括多个基因及它们之间的相互作用。我们提出一种新的统计方法即基于单倍体型水平找出影响某一连续性状变化的基因组多个区域。我们提出的方法是使用具有核函数的半参数回归模型,能够同时考虑大量基因。此种方法比现有的方法能够有效地达到降维的作用。对于参数的估计和非参数函数检验我们参照Liu等和Kwee等,即通过最小乘方核机器(LSKM)进行参数估计和通过打分统计进行非参数函数检验。为了获得最好的基因或区域组合,通过逐步向下P值法筛选。模拟研究证明这种方法的准确性以及检验多个基因的检出效能。我们把这种方法应用到人前列腺癌通路的KLK3表达与339个候选基因的关联分析中,找到影响KLK3表达的基因群,比上节单个基因的分析获得更多的信息。另外,我们应用这种方法研究猪肉质的遗传机制。 第四,基于半参数逻辑斯蒂核模型的多区域单倍体型与二类性状关联分析寻找新的统计 方法来检验疾病的遗传通路越来越受到关注。原因是一个通路中的基因倾向于彼此相互作用,如果使用传统的参数估计由于维数太大而不可行,使得用非参数方法更可取。通过核机器函数对高维基因组单倍体型信息拟合,我们提出了高效灵活的分析和检验基因组基因与疾病关联的遗传通路的半参数逻辑斯蒂模型。按照Liu等,我们把我们半参数模型转化成逻辑斯蒂混合模型来表达,利用现有的统计软件进行参数估计,对非参数函数检验采用打分统计。通过模拟研究证明这种方法准确性以及检验疾病遗传通路的效能。这个方法应用在磷酸盐治疗下的多种骨髓瘤下巴骨坏死病人数据的通路分析中。 第五,基于半参数回归模型的多区域单倍体型与纵向性状关联分析对于具有多次记录的 纵向数据研究中,能够同时考虑影响性状的时间及其他协变量是很重要的。基于Zhang等研究纵向数据的半参数模型,我们把模型的参数固定效应用来拟合单倍体型和其他固定协变量效应,参数的估计按照Zhang等的方法,采用似然比检验来检验单倍体型效应。通过对我们改进的方法与一般的混合模型Haplo.stats和FAMHAP的htr模型进行模拟比较,证实对动态性状通过考虑多次采样数据的时间效应比对单次采样更能提高单倍体型效应的检出率。我们通过这种半参数回归模型研究猪具有多胎的繁殖记录与MMP1和MMP10基因的单倍体型分析中。 综上所述,本论文针对基因组研究中存在的问题,建立了基于广义线性模型研究复杂性状与多个非连锁区域单倍体型联合的关联分析、基于核函数的半参数回归模型分析静态与动态数据的遗传模式。通过模拟研究证实了模型的可靠性,并将我们的模型系统应用到多个实际的例子中。本研究结果不仅能推进复杂性状候选基因研究,而且为从基因组层面上进行复杂性状遗传通路等研究的实施奠定了理论基础。同时,这些算法都开发出相应的软件程序并可自由下载,为科研工作者提供更全面准确的进行基因组关联分析。
[Abstract]:The genetic structure that affects complex traits involves many genes. Mutations in the cis-and Trans-position of these genes can interact with each other to influence complex traits. Therefore, joint analysis of multiple loci yields more information than single locus analysis. Haploidy is located on a chromosome or in a region. Obviously, the analysis of haplotypes provides a more convenient and effective way to explore the genetic mechanism of genetic traits, especially complex traits, by using SNP information. The main research contents and results are as follows:
Firstly, the association analysis between complex traits and complex traits based on semi-parametric regression model
Localization of functional loci in a region has attracted considerable attention. Currently, the common method is to analyze a single SNP in each region separately. However, incomplete linkage imbalance between SNPs and trait markers may result in a decrease in the validity of the analysis results. Therefore, based on the semi-parametric regression model for quantitative trait loci such as Kwee, the model can utilize the information of multiple SNPs at the same time and can consider the linkage imbalance between loci, but compared with the existing multi-locus joint analysis method, the dimension of the model can be reduced. In this section, we use HapMap's public data to filter out multiple combined SNP markers associated with quantitative traits. Sixty-seven genes with lymphocyte expression data in prostate cancer pathway and 339 candidate genes were analyzed for cis-and trans-regulation, and cis-and trans-acting factors affecting gene expression in prostate cancer pathway were identified.
Secondly, the genetic basis package of complex traits for multiregional haplotype and complex traits association analysis based on parametric regression model
It is preferable to consider multiple genes and multiple regions at the same time. We propose a generalized linear regression model based association analysis between complex traits and haplotypes of multiple non-linked regions to test the zero hypothesis of haplotype effect by scoring statistics. The validity of the model and the detection rate of haplotype interaction were verified by simulating the accuracy and detection efficiency of the proposed method. For data without considering other covariates, FAMHAP was used. Comparing the HTR and hapcc models, we found that the accuracy and effective detection rate of our method were comparable to those of HTR and hapcc, and even exceeded them. In addition, our model could consider more types of traits and allow other covariates to be added. In order to verify the effectiveness of our method, we applied it to 4 non * chain candidate genes and pig meat quality. Correlation analysis.
Thirdly, the association analysis between haplotypes and complex traits based on semi-parametric regression model for inheritance of general complex traits
We propose a new statistical method based on haplotype level to identify multiple regions of the genome that affect a continuous trait change. Our method is to use a semi-parametric regression model with kernel function, which can consider a large number of genes at the same time. For parameter estimation and nonparametric function testing, we refer to Liu et al. and Kwee et al., that is, we estimate the parameters by the Least Square Kernel Machine (LSKM) and test the nonparametric function by scoring statistics. Screening. Simulation studies demonstrate the accuracy of this method and the detection efficiency of multiple genes. We applied this method to the association analysis of KLK3 expression in the human prostate cancer pathway with 339 candidate genes, and found the gene groups that affect the expression of KLK3. More information was obtained than the analysis of single gene in the previous section. This method * studies the genetic mechanism of pig meat quality.
Fourthly, based on the semi-parametric logistic kernel model, the association analysis of haplotypes and second-class traits in multiple regions is conducted to find new statistics.
The reason is that genes in one pathway tend to interact with each other. If the traditional parameter estimation is too large to be feasible, it is preferable to use non-parametric methods. According to Liu et al, we transformed our semi-parametric model into a logistic mixed model to express it. We used existing statistical software to estimate the parameters, and used scoring statistics to test non-parametric functions. To determine the accuracy of this method and the effectiveness of examining the genetic pathways of disease. This method was applied to pathway analysis of data from multiple myeloma patients with chin osteonecrosis treated with phosphate.
Fifth, the association analysis of haplotypes and longitudinal traits based on semi-parametric regression model for multiple records
In longitudinal data study, it is important to consider both the time and other covariates that affect the traits. Based on the semi-parametric model of longitudinal data studied by Zhang et al, we use the fixed effects of the parameters of the model to fit the haploid type and other fixed covariate effects. The estimation of the parameters is based on Zhang et al. The likelihood ratio test is used to test the parameters. By comparing our improved method with the general mixed model Haplo. stats and FAMHAP's HTR model, it is proved that the detection rate of haplotype effect can be improved by considering the time effect of multiple sampling data than by single sampling. * the reproductive records of pigs with multiple births and the haplotype analysis of MMP1 and MMP10 genes were studied.
In summary, aiming at the problems existing in genome research, this paper established a generalized linear model to study the association between complex traits and haplotypes of multiple non-linked regions, and a semi-parametric regression model based on kernel function to analyze the genetic model of static and dynamic data. The results of this study not only promote the study of candidate genes for complex traits, but also lay a theoretical foundation for the study of genetic pathways for complex traits at the genome level. Researchers provide more comprehensive and accurate analysis of genome association.
【学位授予单位】:上海交通大学
【学位级别】:博士
【学位授予年份】:2009
【分类号】:R346
本文编号:2219030
[Abstract]:The genetic structure that affects complex traits involves many genes. Mutations in the cis-and Trans-position of these genes can interact with each other to influence complex traits. Therefore, joint analysis of multiple loci yields more information than single locus analysis. Haploidy is located on a chromosome or in a region. Obviously, the analysis of haplotypes provides a more convenient and effective way to explore the genetic mechanism of genetic traits, especially complex traits, by using SNP information. The main research contents and results are as follows:
Firstly, the association analysis between complex traits and complex traits based on semi-parametric regression model
Localization of functional loci in a region has attracted considerable attention. Currently, the common method is to analyze a single SNP in each region separately. However, incomplete linkage imbalance between SNPs and trait markers may result in a decrease in the validity of the analysis results. Therefore, based on the semi-parametric regression model for quantitative trait loci such as Kwee, the model can utilize the information of multiple SNPs at the same time and can consider the linkage imbalance between loci, but compared with the existing multi-locus joint analysis method, the dimension of the model can be reduced. In this section, we use HapMap's public data to filter out multiple combined SNP markers associated with quantitative traits. Sixty-seven genes with lymphocyte expression data in prostate cancer pathway and 339 candidate genes were analyzed for cis-and trans-regulation, and cis-and trans-acting factors affecting gene expression in prostate cancer pathway were identified.
Secondly, the genetic basis package of complex traits for multiregional haplotype and complex traits association analysis based on parametric regression model
It is preferable to consider multiple genes and multiple regions at the same time. We propose a generalized linear regression model based association analysis between complex traits and haplotypes of multiple non-linked regions to test the zero hypothesis of haplotype effect by scoring statistics. The validity of the model and the detection rate of haplotype interaction were verified by simulating the accuracy and detection efficiency of the proposed method. For data without considering other covariates, FAMHAP was used. Comparing the HTR and hapcc models, we found that the accuracy and effective detection rate of our method were comparable to those of HTR and hapcc, and even exceeded them. In addition, our model could consider more types of traits and allow other covariates to be added. In order to verify the effectiveness of our method, we applied it to 4 non * chain candidate genes and pig meat quality. Correlation analysis.
Thirdly, the association analysis between haplotypes and complex traits based on semi-parametric regression model for inheritance of general complex traits
We propose a new statistical method based on haplotype level to identify multiple regions of the genome that affect a continuous trait change. Our method is to use a semi-parametric regression model with kernel function, which can consider a large number of genes at the same time. For parameter estimation and nonparametric function testing, we refer to Liu et al. and Kwee et al., that is, we estimate the parameters by the Least Square Kernel Machine (LSKM) and test the nonparametric function by scoring statistics. Screening. Simulation studies demonstrate the accuracy of this method and the detection efficiency of multiple genes. We applied this method to the association analysis of KLK3 expression in the human prostate cancer pathway with 339 candidate genes, and found the gene groups that affect the expression of KLK3. More information was obtained than the analysis of single gene in the previous section. This method * studies the genetic mechanism of pig meat quality.
Fourthly, based on the semi-parametric logistic kernel model, the association analysis of haplotypes and second-class traits in multiple regions is conducted to find new statistics.
The reason is that genes in one pathway tend to interact with each other. If the traditional parameter estimation is too large to be feasible, it is preferable to use non-parametric methods. According to Liu et al, we transformed our semi-parametric model into a logistic mixed model to express it. We used existing statistical software to estimate the parameters, and used scoring statistics to test non-parametric functions. To determine the accuracy of this method and the effectiveness of examining the genetic pathways of disease. This method was applied to pathway analysis of data from multiple myeloma patients with chin osteonecrosis treated with phosphate.
Fifth, the association analysis of haplotypes and longitudinal traits based on semi-parametric regression model for multiple records
In longitudinal data study, it is important to consider both the time and other covariates that affect the traits. Based on the semi-parametric model of longitudinal data studied by Zhang et al, we use the fixed effects of the parameters of the model to fit the haploid type and other fixed covariate effects. The estimation of the parameters is based on Zhang et al. The likelihood ratio test is used to test the parameters. By comparing our improved method with the general mixed model Haplo. stats and FAMHAP's HTR model, it is proved that the detection rate of haplotype effect can be improved by considering the time effect of multiple sampling data than by single sampling. * the reproductive records of pigs with multiple births and the haplotype analysis of MMP1 and MMP10 genes were studied.
In summary, aiming at the problems existing in genome research, this paper established a generalized linear model to study the association between complex traits and haplotypes of multiple non-linked regions, and a semi-parametric regression model based on kernel function to analyze the genetic model of static and dynamic data. The results of this study not only promote the study of candidate genes for complex traits, but also lay a theoretical foundation for the study of genetic pathways for complex traits at the genome level. Researchers provide more comprehensive and accurate analysis of genome association.
【学位授予单位】:上海交通大学
【学位级别】:博士
【学位授予年份】:2009
【分类号】:R346
【引证文献】
相关期刊论文 前1条
1 张雁明;邢国芳;刘美桃;刘晓东;韩渊怀;;全基因组关联分析:基因组学研究的机遇与挑战[J];生物技术通报;2013年06期
,本文编号:2219030
本文链接:https://www.wllwen.com/yixuelunwen/shiyanyixue/2219030.html
最近更新
教材专著