利用计算方法研究疾病突变的分子调控机制
发布时间:2018-03-25 18:03
本文选题:疾病突变 切入点:调控元件 出处:《安徽大学》2017年硕士论文
【摘要】:随着高通量测序技术的发展,产生海量生物数据,但是如何从生物大数据中挖掘出所蕴含生物规律是一个巨大的挑战。生物信息学是一门利用统计分析、计算方法以及其他学科来分析研究生物学的交叉学科。基因表达是一个高度调控的过程,一直是生物信息学的研究热点之一。基因表达过程可以分为转录和翻译两大部分,在每一阶段都有众多的调控元件、蛋白质分子参与其中,任何一个阶段出现异常,都有可能导致基因功能失活,影响基因的表达,最后导致疾病的发生。调控元件在基因组上广泛分布,深入参与基因的表达,调控元件的功能活性变化情况对基因表达有重要作用。落在调控元件上的基因突变可以改变元件的功能活性,对基因表达产生异常影响,是重要的分子致病机制之一。为了定量度量不同调控元件突变对基因表达的影响程度,本文对四类不同疾病的相关突变的分子调控机制进行了研究,发现不同种类的疾病突变具有不同特异性的分子调控机制。另外,利用序列模式挖掘建模方法,对调控元件中的启动子序列和增强子序列进行建模研究,进一步分析启动子和增强子突变致病机制。本文主要研究工作和创新之处如下:(1)不同种类的疾病突变富集于不同的调控元件区域。首先从FANTOM、ENCODE项目组公布的数据中获取九类调控元件,发现不同类型调控元件在基因组上的分布显著差异;然后从OMMI,GWAS,ClinVar,VarDi等数据库获取四类疾病突变数据:遗传疾病突变,癌症诱发性生殖细胞突变,癌症体细胞突变和复杂疾病突变;统计四类疾病突变在九类调控元件上的发布,发现遗传疾病突变富集于启动子,癌症突变富集于启动子、甲基化区域和染色体物理互作区域,复杂疾病在九类调控元件上的分布均匀。(2)利用序列模式挖掘模型,对启动子和增强子的突变致病机制进行研究,量化突变对启动子和增强子功能活性的影响程度。基因序列数据上蕴含着丰富的调控序列,它们能够在基因表达过程中发挥调控功能,产生不同的蛋白产物。结合序列的差异性以及保守性特征,本文融合频繁模式挖掘与PSSM模型,对启动子和增强子进行建模研究,实现了对启动子信号强度和增强子信号强度的定量度量,计算验证实验表明该模型能够有效的区分真、假启动子以及增强子。并进一步对启动子和增强子上的突变进行研究,结果显示启动子信号强度降低则致病概率增大,表明降低启动子信号强度的启动子单核苷酸突变与疾病有正相关性;而增强子上疾病突变导致的信号强度的改变,与疾病发生无显著相关性。
[Abstract]:With the development of high-throughput sequencing technology, huge amounts of biological data are produced, but it is a great challenge to find out the biological laws from the biological big data. Bioinformatics is a statistical analysis. Gene expression is a highly regulated process and has always been one of the hot topics in bioinformatics. Gene expression can be divided into two parts: transcription and translation. At each stage, there are many regulatory elements, in which protein molecules are involved. Any abnormal phase may lead to inactivation of gene function and affect gene expression. Finally, the disease occurs. The regulatory elements are widely distributed in the genome, deeply involved in gene expression, Changes in the functional activity of regulatory elements play an important role in gene expression. Gene mutations that fall on the regulatory elements can change the functional activity of the elements and have an abnormal effect on gene expression. In order to quantitatively measure the effect of mutations of different regulatory elements on gene expression, the molecular regulatory mechanisms of mutations related to four different diseases have been studied in this paper. It is found that different disease mutations have different specific molecular regulation mechanisms. In addition, the promoter sequence and enhancer sequence in regulatory elements are modeled by using sequence pattern mining modeling method. Further analysis of the pathogenetic mechanism of promoter and enhancer mutation. The main work and innovations of this paper are as follows: 1) different disease mutations are concentrated in different regulatory element regions. Firstly, the data published by the FANTOMMONCODE project team. Gets nine types of regulatory elements, It was found that there were significant differences in the distribution of different types of regulatory elements in the genome, and then four kinds of disease mutation data were obtained from OMMIA GWASN ClinvarvarDi database: genetic disease mutation, cancer-induced germ cell mutation, cancer somatic mutation and complex disease mutation. Four kinds of disease mutations were reported on nine regulatory elements. Genetic disease mutations were found to be enriched in promoters, cancer mutations in promoters, methylation regions and chromosomal physical interactions. Complex diseases are evenly distributed on nine regulatory elements.) using sequential pattern mining models, the mutational pathogenicity of promoters and enhancers is studied. The extent to which quantitative mutations affect the functional activity of promoters and enhancers. Gene sequence data contain a wealth of regulatory sequences that can play regulatory roles in the course of gene expression. We combine frequent pattern mining with PSSM model to model promoter and enhancer. The quantitative measurement of signal intensity of promoter and enhancer is realized. The experimental results show that the model can effectively distinguish true promoter from false promoter and enhancer. Furthermore, the mutation on promoter and enhancer is studied. The results showed that when the signal intensity of promoter decreased, the probability of pathogenicity increased, which indicated that the single nucleotide mutation of promoter which decreased the signal intensity of promoter was positively correlated with disease, while the signal intensity of disease mutation on enhancer was changed. There was no significant correlation with disease.
【学位授予单位】:安徽大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q811.4;R3416
【参考文献】
相关期刊论文 前10条
1 孙永山;赵海峰;汤振宇;李旦;马猛;陈荣;;基于序列模式挖掘的基因剪接位点[J];数据采集与处理;2016年05期
2 马宏;王永芳;李伟;;谷子突变体研究进展[J];广东农业科学;2014年04期
3 李彪;;桉树全基因组测序及相关研究进展[J];林业实用技术;2013年07期
4 王帆;刘帅;;计算机在生物信息学中的应用[J];科技致富向导;2012年35期
5 马猛;汪洋;;应用序列特征分析基因剪接信号[J];计算机工程与应用;2012年27期
6 熊燕;陈大明;杨琛;赵国屏;;合成生物学发展现状与前景[J];生命科学;2011年09期
7 王悦冰;郎志宏;黄大f ;;内含子对真核基因表达调控的影响[J];生物技术通报;2008年04期
8 郑一哲;杜进堂;李艳梅;;生命体系中的氢键[J];大学化学;2007年02期
9 管晓翔;陈龙邦;;组蛋白乙酰化修饰在基因表达调控中的作用机制[J];中华肿瘤防治杂志;2007年04期
10 屈艾,汪承润,蒋继宏;遗传信息传递的中心法则发展过程[J];细胞生物学杂志;2003年01期
,本文编号:1664245
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/1664245.html