基于单倍型的关联分析方法
发布时间:2018-01-25 05:17
本文关键词: 单倍型关联分析 logistic回归 单倍型聚类 病例-对照研究 U-统计量 熵 出处:《东北师范大学》2011年博士论文 论文类型:学位论文
【摘要】:人类基因组计划的完成,不论从数量上还是从质量上,都极大地丰富了人类遗传的数据资源,但也容易使人迷失在这浩如烟海的信息中。统计学,作为一种强有力的数据分析工具,越来越受到人们的重视并在遗传流行病的研究中发挥着不可替代的作用。 关联分析主要通过研究遗传标记物与可观测的性状之间的统计相关性,来寻找和定位致病基因,并为我们更好的地理解疾病遗传基础发挥了重要的作用。单倍型,作为一种常见的数据类型,被人们认为含有更多的连锁不平衡(LD)信息,而且与其他方法相比,基于单倍型的关联分析在识别疾病关联上有更大的功效,尤其是病例—对照研究中稀有疾病的情况。但是,对这些单倍型进行建模,其中的稀有单倍型会带来很多的统计问题——大量的参数会使功效减少、效率降低。为了克服这些问题,单倍型聚类是个不错的解决方式。本文着重介绍了在基于单倍型的关联分析中,如何有效地利用位点本身以及位点间的信息来提高检验的功效,其中包括一个参数方法和一个非参数方法。 本文首先介绍了基于单倍型聚类来进行关联分析的方法,称之为APEG,通过使用EG距离应用AP算法对单倍型进行有效合理的聚类。新提出的针对单倍型这一特殊数据类型的相似性度量EG距离,能够利用不同位点上以及位点之间的结构信息。通过模拟和真实数据的研究发现,APEG方法要比现存的其他方法在探测单倍型与疾病之间是否相关联方面拥有更大的功效,而且在基因定位上,也能够得到比较精确的估计。然后,我们介绍了基于U—统计量的非参数方法U-EGS,其优点是渐进正态性,而且不需要对样本总体的分布进行假设。U-EGS中引入的新的核函数EGS,是EG距离的一种推广,同样也能利用位点的信息。随后的模拟研究也证实了,在不同的参数下,对不同的疾病模型,使用能够融入位点信息的核函数EGS的U—统计量要比没有利用位点信息的U—统计量在统计功效上拥有更大的优势。
[Abstract]:The completion of human genome project, both in terms of quantity and quality, have greatly enriched the human genetic data resources, but also easy to make people lost in the multitude of information. In statistics, as a powerful tool for data analysis, more and more people's attention and play an irreplaceable role in the study of genetic epidemiology.
The correlation between the statistical correlation analysis mainly through the study of genetic markers and observable traits, to find and locate genes, and for our better understanding of the genetic basis of disease has played an important role. The haplotype, as a common type of data, by people that contain more linkage disequilibrium (LD) information, and compared with other methods, the haplotype association analysis is more effective in identifying disease association based on, especially a case-control study in rare diseases. However, the modeling of these haplotypes, which caused by rare haplotypes statistics -- a large number of parameters will make much effect reduction efficiency reduced. In order to overcome these problems, the haplotype clustering is a good way to solve. This article focuses on the haplotype based association studies, how to effectively use the site The information between the loci and itself improves the effectiveness of the test, including a parameter method and a non parametric method.
This paper first introduces the method of correlation analysis based on haplotype clustering, called APEG, are effective and reasonable for haplotype clustering by using EG distance by AP algorithm. The new proposed according to the similarity of this special type of metric distance EG haplotype data, can utilize the structure information between different sites and sites through the simulation and real data. The research found that the APEG method than other existing methods in detection between haplotypes and disease is associated with greater efficiency, but also in gene mapping, can be estimated accurately. Then, we introduced the U-EGS U nonparametric methods based on statistics, the has the advantages of asymptotic normality, and does not require the distribution of the samples by the hypothesis in.U-EGS new kernel EGS is an extension of the EG distance, also can use a Point information. Subsequent simulation studies also confirm that under different parameters, for different disease models, the U statistic using kernel function EGS that can incorporate site information is more powerful than statistical U statistics without using loci information.
【学位授予单位】:东北师范大学
【学位级别】:博士
【学位授予年份】:2011
【分类号】:R346
【引证文献】
相关硕士学位论文 前1条
1 佟良;基因型带有误差时QTL的区间定位[D];黑龙江大学;2013年
,本文编号:1462092
本文链接:https://www.wllwen.com/xiyixuelunwen/1462092.html
最近更新
教材专著