基于信息熵的胚胎干细胞DNA甲基化新标记识别与表征
发布时间:2018-05-30 21:53
本文选题:胚胎干细胞 + DNA甲基化标记 ; 参考:《哈尔滨工业大学》2015年博士论文
【摘要】:胚胎干细胞(ESC)具有多向分化潜能和发育的全能性。DNA甲基化在胚胎干细胞自我更新和分化过程中扮演重要角色。因此胚胎干细胞特异的甲基化新标记的识别和功能分析对于理解决定细胞命运的复杂而精细的调控网络具有重要意义。因此,整合基于重亚硫酸盐测序技术测定的发育过程中的高通量单碱基水平的DNA甲基化图谱,开发面向实验科学家的精确高效的数据库平台和生物信息学软件,系统地识别和表征胚胎干细胞特有的DNA甲基化新标记,对于深入理解胚胎干细胞的多能性维持和定向分化具有重要意义。本论文基于信息熵理论开发了定量甲基化特异性的算法,并通过随机数据和真实数据对该方法的精确性、样本适用性和资源占用等特性进行了系统评估;基于基因组相邻Cp G位点间距离依赖的甲基化相似性开发了基因组片段化的算法,进一步基于t检验开发了甲基化标记识别的统计学方法,利用Python实现了以上算法并编制了甲基化特异性分析报告软件SMART。通过与已有甲基化分析软件的比较表明了SMART在基于大样本甲基化组从头识别基因组甲基化功能区域的作用。通过整合高通量DNA甲基化数据,以胚胎干细胞为中心,以发育时间为主线,以方便实验科学家为目标,开发了小鼠发育甲基化数据库Dev Mouse。并基于信息熵理论开发集成了全自动化的在线分析和可视化工具。同时整合人类的高通量DNA甲基化数据,构建了以胚胎干细胞为中心的人类甲基化数据库Human MethyDB。两个发育甲基化数据库的构建不仅为本论文的后续研究奠定了数据基础,也有利于实验科学家方便快捷地从事发育相关DNA甲基化的生物信息学分析。基于本文开发的信息熵算法分析Dev Mouse中收录的小鼠由胚胎干细胞向脑发育过程中DNA甲基化等表观基因组学数据,结果发现了DNA甲基化与组蛋白修饰H3K27me3的共同变化。进一步识别了小鼠脑发育过程中1 341个差异甲基化的Cp G岛,发现了其与差异H3K27me3修饰Cp G岛的显著重叠。对429个差异表达基因的分析证实了DNA甲基化与其他组蛋白修饰对发育相关基因差异表达的协同调控,特别是对重编程转录因子基因和印记基因的动态调控。并揭示了基因间区Cp G岛作为新基因标记及重要调控元件的潜能。将SMART应用到Human MethyDB收集的人类DNA甲基化组,从头识别了757 887个功能片段。其中75%的片段在所有细胞类型中一致甲基化,表明了人类基因组甲基化的稳定性,分析发现一致超高甲基化片段多位于重复元件上,而一致超低甲基化片段多为基因启动子Cp G岛。对高特异甲基化片段的分析则发现胚胎干细胞特异超低甲基化不仅可以作为干细胞的稳定标记,还参与调控重要的发育基因。并利用SMART识别了人类胚胎干细胞中的3 758个DNA甲基化标记,且发现各多能性干细胞间共享更多的超高甲基化标记。利用高通量的表观基因组数据对胚胎干细胞甲基化标记进行了系统的表征。结果发现人类胚胎干细胞超低甲基化标记显著富集在发育相关功能,对长度大于3500 bp的甲基化标记的分析揭示了胚胎干细胞的特异性甲基化模式。通过组学分析发现了胚胎干细胞超低甲基化标记以细胞特异性的方式显著富集了超级增强子标记(H3K27ac和转录因子结合位点)。发现了胚胎干细胞中超低甲基化标记和超级增强子的显著重叠,识别了二者重叠的超低甲基化标记以及71个相关的多能性基因,功能富集分析揭示了这些基因的敲除可导致发育异常。人鼠间的比较分析则揭示了胚胎干细胞甲基化标记的物种保守性。综上所述,本文基于信息熵开发了甲基化分析的软件和数据库,有助于实验人员进行胚胎干细胞甲基化新标记的筛选和功能分析。对小鼠和人类胚胎干细胞的DNA甲基化新标记进行了系统的识别和表征,为深入理解胚胎干细胞的多能性维持和定向分化机制提供了新的重要参考。
[Abstract]:Embryonic stem cells (ESC) have multiple differentiation potential and developing omnipotent.DNA methylation plays an important role in the self renewal and differentiation of embryonic stem cells. Therefore, the identification and functional analysis of the specific new methylation markers in embryonic stem cells are of great significance for understanding the complex and fine regulatory network determining cell fate. Therefore, the DNA methylation Atlas of high throughput single base level in the development process based on sulfite sequencing technology is integrated, the accurate and efficient database platform and bioinformatics software for experimental scientists are developed to systematically identify and characterize the new DNA methylation markers specific to embryonic stem cells, and to understand the embryos in depth. The pluripotent maintenance and directional differentiation of stem cells is of great significance. This paper developed a quantitative methylation specific algorithm based on the information entropy theory, and systematically evaluated the accuracy of the method, sample applicability and resource occupancy through random data and real data; based on the distance dependence of the Cp G loci between the genome adjacent to the genome The methylation similarity of Lai developed a genomic fragmentation algorithm, further developed a statistical method for methylation identification based on t test, implemented the above algorithm by using Python and compiled a methylation specific analysis report software SMART. by comparison with the existing methylation analysis software to show that SMART is based on large sample armour. Based on the integration of high throughput DNA methylation data, integrated high-throughput DNA methylation data, embryonic stem cells as the center and development time as the main line, the mouse developmental methylation database Dev Mouse. was developed and integrated with the information entropy theory to develop a fully automated online system based on the information entropy theory. Analysis and visualization tools, simultaneously integrating human high throughput DNA methylation data, constructs two developmental methylation databases for human methylation database Human MethyDB. centered on embryonic stem cells, which not only lays a data base for the follow-up study of this paper, but also facilitates the rapid and rapid development of experimental scientists. Bioinformatics analysis of related DNA methylation. Based on the information entropy algorithm developed in this paper, the epigenetic data of DNA methylation in mouse embryonic stem cells from Dev Mouse to brain development were analyzed. The results found that DNA methylation and histone modified H3K27me3 were common changes. Further identification of brain development in mice was made. 1341 Cp G islands with differential methylation showed a significant overlap with the differential H3K27me3 modified Cp G island. Analysis of 429 differentially expressed genes confirmed the synergistic regulation of DNA methylation and other histone modification on differential expression of developmental genes, especially the dynamic regulation of reprogramming transcription factor gene and imprinting genes. The potential of Cp G island in the intergenic region was revealed as a new gene marker and important regulatory element. SMART was applied to the DNA methylation group of Human MethyDB to identify 757887 functional segments from scratch. 75% of the fragments were methylation in all cell types, indicating the stability of the methylation of human genome. The hypermethylation fragment is mostly on the repeating element, while the unanimous hyper methylation fragment is mostly the gene promoter Cp G island. The analysis of the highly specific methylation fragments shows that the specific hyper methylation of embryonic stem cells can not only be used as a stable marker for stem cells, but also participate in the regulation of the important developmental genes. And the use of SMART to identify human beings 3758 DNA methylation markers in embryonic stem cells were identified and more hypermethylation markers were shared among various pluripotent stem cells. The methylation markers of embryonic stem cells were systematically characterized by high throughput epigenetic data. The results showed that ultra low methylation markers in human embryonic stem cells were significantly enriched in development related work. The specific methylation patterns of embryonic stem cells were revealed by the analysis of the methylation markers longer than 3500 BP. The superhyper methylated markers in embryonic stem cells were found to significantly enrich the superenhancer markers (H3K27ac and transcriptional binding sites) in a cell specific way. The hyper methylation and 71 related pluripotent genes overlapped by the two were identified, and 71 related pluripotent genes were identified. Functional enrichment analysis revealed that the knockout of these genes could lead to abnormal development. In this paper, based on the information entropy, the software and database of methylation analysis are developed, which help the experimenters to screen and analyze the new markers for the methylation of embryonic stem cells. A systematic identification and characterization of the new DNA methylation markers for mouse and human embryonic stem cells are carried out in order to understand the pluripotent maintenance of embryonic stem cells. The mechanism of directional differentiation provides a new and important reference.
【学位授予单位】:哈尔滨工业大学
【学位级别】:博士
【学位授予年份】:2015
【分类号】:R321
,
本文编号:1956947
本文链接:https://www.wllwen.com/yixuelunwen/jichuyixue/1956947.html