当前位置:主页 > 医学论文 > 实验医学论文 >

医学遗传资料统计分析方法的研究与SAS实现

发布时间:2018-09-14 07:01
【摘要】: 数理统计分析方法在医学遗传学的发展过程中发挥了不可替代的作用,随着基础医学的发展、遗传学实验技术的不断更新,许多遗传统计分析技术已经成熟,应用越来越普及,同时新的分析方法不断地涌现出来。针对新的、更为复杂的方法如何运用,成熟、普及方法如何快速实现计算是当今医学遗传科研人员面临的问题。本研究针对医学遗传资料统计分析方法进行了比较细致的研究,特别是遗传结果多重比较的校正、多个位点与疾病的关联研究、连锁分析等问题,通过反复测算,提出了自己的见解,并将全部方法运用世界权威的统计分析软件—SAS软件,通过调用过程步、编程实现了计算。 针对目前医学遗传学中主要的统计分析方法,本研究侧重进行了以下几部分工作: 第一部分:测算基因频率、基因型频率以及验证Hardy-Weinberg平衡定律哈代-温伯格平衡定律在遗传学的研究中起着非常重要的作用。在对遗传基因型数据进行分析前,最好能够先检验数据是否符合哈代-温伯格平衡定律。本章介绍了哈代-温伯格平衡定律的基础理论,并利用软件计算基因、基因型频率、验证哈代-温伯格平衡定律、利用蒙特卡洛模拟校正概率。 第二部分:运用病例对照方法寻找疾病的关联位点 病例-对照研究是分析流行病学研究方法中最基本、最重要的研究类型之一,是检验病因假说的重要工具。在遗传流行病学中,利用病例-对照研究可以找到复杂疾病的关联基因。可采用一般χ2检验与Armitage趋势检验。采用一般χ2检验来求得疾病与某个位点的相关性,要求检验的群体满足哈代-温伯格平衡定律。研究表明,如果哈代-温伯格平衡定律不成立,χ2检验的第一类错误会增加,因此应根据基因型数据用Armitage趋势检验来作统计分析。 第三部分:遗传分析结果的校正 在病例-对照遗传流行病数据分析过程中,随着生物技术的迅速发展,实验室快速检测大量位点已经成为常规手段。对于每一个位点都需要进行统计学检验,如果位点过多,多重比较会导致假阳性率无限增大,从而使得结论不可信,因此需要对多重比较进行校正,本章运用3种平滑修正的方法,以及校正方法Bonferroni法、Sidak法等。 第四部分:家系数据的关联分析 利用家庭成员作为对照是按祖先起源匹配的最好办法,以遗传背景一致的家庭成员作为对照,可以很好地解决人群分层问题。根据家庭成员不同,分析方法也不尽相同。本章对家系病例对照数据进行了TDT、s-TDT、SDT检验。 第五部分:连锁不平衡与单体型分析 连锁不平衡分析、单体型分析是一类对疾病相关联基因进行精确定位的高效的方法,在检测复杂疾病基因时起到了巨大的作用。在数据收集方面,它不需要收集家系数据,这是与家系数据疾病关联分析的一个区别——它的应用条件比较宽泛。本章细致地研究了连锁不平衡检测方法、单体型与疾病关联分析。 第六部分:近交系数与亲缘系数的计算 近亲婚配为非随机婚配,这类婚配严重影响着群体中的基因平衡法则,导致群体中纯合子和杂合子的比率发生变化。哈代-温伯格法则仅仅适用于随机婚配的群体而不适用于这类群体。本章将对近亲婚配中近交系数和亲缘系数进行计算。 第七部分:连锁分析 个体形成性细胞过程中,减数分裂时同源染色体间发生交换的频率称为重组率。重组率的大小与同一条染色体上两个基因座位距离有关,一般说距离远时发生交换的机会多,重组率高,若重组率超过0.50,表明这两个基因座位不在同一条染色体上。重组率比较低,说明两个基因座位位置比较近,这两个基因座位上的等位基因传递到下一代是不独立的,这种现象在遗传学中称为连锁。本章主要介绍贝叶斯方法和蒙特卡洛模拟法估计重组率。 文中采用SAS9.1.3、SAS9.2分析软件genetics模块、stat模块中多个过程步以及编程方法对医学遗传学资料和数据进行了统计运算。本文运用了统计模型理论与实例分析相结合,理论研究与软件实现结合,数学方法与遗传实验技术结合的总体思路,按着由简到繁的过程系统地介绍了各种遗传统计分析方法,以及统计分析模型及计算原理,尤其对于遗传结果校正、多个位点与疾病关联、连锁分析等方法进行了详细的阐述,提出了新观点。文中突出了统计分析方法的应用技巧和便捷实现,不但为医学遗传学提供了统计方法学,更为该分支学科的数据运算提供了新平台。
[Abstract]:Mathematical statistics analysis method plays an irreplaceable role in the development of medical genetics. With the development of basic medicine and the renewal of genetic experiment technology, many genetic statistics analysis techniques have been mature and applied more and more widely. At the same time, new analysis methods have emerged constantly. How to use, mature, and popularize the method of computing is a problem facing medical genetic researchers. This study focuses on the statistical analysis of medical genetic data, especially the correction of multiple comparisons of genetic results, the association between multiple loci and disease, linkage analysis and other issues. Through Repeated calculation, put forward their own views, and all the methods used in the world's authoritative statistical analysis software-SAS software, through the call process step, programming to achieve the calculation.
In view of the main statistical analysis methods in medical genetics, this study focuses on the following parts:
Part one: Estimating gene frequency, genotype frequency and verifying Hardy-Weinberg equilibrium law Hardy-Weinberg equilibrium law play a very important role in genetics research. It is better to check whether the data conform to Hardy-Weinberg equilibrium law before analyzing the genotype data. The basic theory of Weinberg's equilibrium law, and the use of software to calculate the gene, genotype frequency, verify Hardy-Weinberg equilibrium law, using Monte Carlo simulation correction probability.
The second part: using case control method to find the related sites of disease.
Case-control study is one of the most basic and important types of epidemiological research methods and an important tool to test the hypothesis of etiology. In genetic epidemiology, case-control study can be used to find the genes associated with complex diseases. Studies have shown that if Hardy-Weinberg equilibrium law does not hold, the first type of errors in the_2 test will increase, so the Armitage trend test should be used for statistical analysis based on genotype data.
The third part: correction of genetic analysis results.
With the rapid development of biotechnology, rapid detection of large numbers of loci has become a routine method in case-control genetic epidemiological data analysis. Statistical tests are required for each locus. If too many loci are present, multiple comparisons will lead to an infinite increase in false positive rates, making the conclusions unreliable. To correct multiple comparisons, three smoothing correction methods, Bonferroni method and Sidak method are used in this chapter.
The fourth part: family data correlation analysis.
Using family members as controls is the best way to match according to ancestral origin. Using family members with identical genetic background as controls can solve the problem of population stratification.
The fifth part: linkage disequilibrium and haplotype analysis.
Linkage disequilibrium analysis, haplotype analysis, is a class of highly efficient methods for precise mapping of disease-related genes, which plays a huge role in detecting complex disease genes. In this chapter, the linkage disequilibrium detection methods, haplotype and disease association are studied in detail.
The sixth part: Calculation of inbreeding coefficient and kin coefficient.
Inbreeding is a kind of non-random mating, which seriously affects the law of gene balance in the population, resulting in changes in the ratio of homozygotes and heterozygotes in the population.
The seventh part: linkage analysis.
In the process of individual morphogenetic cells, the frequency of exchange between homologous chromosomes during meiosis is called recombination rate. The size of recombination rate is related to the distance between two loci on the same chromosome. Generally speaking, there are more chances of exchange and higher recombination rate when the distance is long. If the recombination rate exceeds 0.50, the two loci are not the same. The low recombination rate on chromosome indicates that the two loci are close together, and the allele transfer from the two loci to the next generation is not independent. This phenomenon is called linkage in genetics. This chapter mainly introduces Bayesian method and Monte Carlo simulation to estimate recombination rate.
In this paper, we use SAS 9.1.3, SAS 9.2 analysis software genetics module, stat module in many process steps and programming methods for statistical calculation of medical genetic data and data. According to the process from simplicity to complexity, this paper systematically introduces various methods of genetic statistical analysis, as well as statistical analysis models and calculation principles, especially elaborates on the methods of genetic result correction, multiple loci associated with disease, linkage analysis and so on, and puts forward new viewpoints. Simple implementation not only provides statistical methodology for medical genetics, but also provides a new platform for data processing in this branch of science.
【学位授予单位】:中国人民解放军军事医学科学院
【学位级别】:硕士
【学位授予年份】:2010
【分类号】:R311

【参考文献】

相关期刊论文 前2条

1 易洪刚;陈峰;于浩;赵杨;娄东华;;病例同胞对照设计[J];中华流行病学杂志;2006年02期

2 汤在祥;王学枫;吴雯雯;徐辰武;;基于贝叶斯统计的遗传连锁分析方法[J];遗传;2006年09期



本文编号:2241915

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/shiyanyixue/2241915.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户fc6c4***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com