谱系形成过程中转录因子调控和表观遗传修饰的多组学整合分析
发布时间:2019-04-15 21:44
【摘要】:2003年,人类基因组计划的完成标志着生命科学进入了新的里程碑,人们开启了研究基因组序列的大门。近年来,随着测序技术的发展,海量的高通量数据涌现,包括ENCODE计划,ROADMAP计划,modENCODE计划在内的大型科学计划应运而生,诸如基因表达量,DNA甲基化,组蛋白修饰,DNA高敏位点等测序数据为人们提供了广泛的研究平台,但如何从这海量的数据中提取有用信息并作出生物学解释仍是一个困扰着大家的科学问题。越来越多的研究表明,谱系形成早期,即植入前胚胎的发育过程经历了一系列剧烈的染色质重编程事件,这种重编程现象不但介导了基因转录的重新启动,同时塑造了胚胎干细胞的全能性,为之后的胚胎发育奠定了基础。但重编程是如何进行的,又有哪些因素发挥了重要作用?为了探讨这一问题,我们讨论了植入前胚胎发育过程中染色质开放状态的分布情况并分析了开放区间中转录因子调控和表观遗传修饰的变化规律,进而对早期发育中细胞命运的影响因素展开了猜想。在本研究中,我们首先分析了植入前胚胎发育中染色质开放状态的全基因组基本性质以及其动态变化规律,发现,随胚胎发育,越来越多的开放区域来自于基因组编码区域,而且第一次细胞命运决定时期ICM中染色质性质最为活跃,其开放区间长度最短且离基因转录起始位点最近。进而,我们扫描并识别了开放区域中转录因子的结合位点,由聚类分析发现,开放区域中的转录因子表现出了发育阶段的特异性。最后我们结合表观遗传的组蛋白修饰数据讨论了发育过程中的组学信号变化情况,最终对细胞命运的决定模型进行了猜想。上述研究表明,在生命初期,转录因子调控和表观遗传修饰对胚胎植入前的重编程过程有着重要影响。我们接下来讨论了胚胎植入后,转录因子调控和表观遗传修饰是如何影响胚胎干细胞分化为各类细胞并形成组织行驶功能的。虽然转录因子作为重要的蛋白分子,调控基因的表达,但基因组中转录因子的结合位点仅为几bp到十几bp的小区间,然而,转录因子的结合位点却呈现出了区域富集的现象,基因组中不到2%的区域富集了超过90%的转录因子结合位点。已有研究在果蝇、线虫以及人类基因组中得到了这类转录因子高度富集的区间并定义为HOT region,如,通过22种转录因子的ChIP-seq数据在线虫中发现了304个含有超过15种转录因子的HOT region。虽然这些转录因子高度的聚集在基因组较小的区间内,但它们是如何相互作用行驶功能的,又是如何影响人类疾病及癌症的仍是一个未知的问题。为了探究这一问题,我们基于DHS数据开发了一种基因组HOT region的识别算法并在实验得到的HOT region中得到了验证,最终识别了154个代表性细胞系中的HOT region。进而,我们从多方面刻画了HOT region的发育分化相关功能并分析了胚胎干细胞到4个终端细胞的分化过程中HOT region及其表观遗传修饰的动态变化规律。在识别并注释了基因组热点区域HOT region的基础上,我们进一步探究了HOT region与人类疾病及癌症的关系。结合GWAS SNPs数据,我们发现,疾病及表型相关的变异位点倾向于在病理学相关的细胞及组织的HOT region中特异性富集;我们以造血细胞分化过程为例,详细讨论了疾病及表型的特异性变化规律,同时,我们对几种重要疾病及癌症展开分析,进一步说明了HOT region的重要意义。最后,我们探究了HOT region与部分致癌机理的关系,发现,肿瘤的形成过程可能需要有肿瘤细胞特异的HOT region以调控相关致病基因的表达。为了探究谱系形成过程中,转录因子调控和表观遗传修饰发挥了怎样的功能,我们首先基于已有数据分析了植入前胚胎发育过程中转录因子调控和表观遗传修饰的性质和变化规律,进而,我们识别并注释了基因组中转录因子结合位点聚集的热点区域,并分析了热点区域及染色质开放区间中转录因子调控和表观遗传修饰的性质及变化规律,最后,我们讨论了热点区域与人类疾病及癌症的关系。本研究着眼于细胞谱系的形成过程,重点关注于非编码区域的调控功能,既为基因组非编码区的研究增添了新的内容,同时也对细胞命运决定因素的展开了讨论和思考。
[Abstract]:In 2003, the completion of the human genome project marked a new milestone in the life science, and people opened the door to the study of the genome sequence. In recent years, with the development of sequencing technology, massive high-throughput data has emerged, including the ENCODE program, the ROADMAP program, the modENCODE program, and so on, such as gene expression, DNA methylation, histone modification, Sequencing data, such as the DNA high-sensitivity site, provides a wide range of research platforms, but how to extract useful information from this mass of data and to make a biological explanation remains a scientific problem. more and more studies have shown that the early stage of the genealogy, i. e. the development of pre-implantation embryos, has experienced a series of severe chromatin reprogramming events that not only mediate the restart of the gene transcription, but also shape the potency of the embryonic stem cells, And laid the foundation for the development of the later embryo. But what are the factors that play an important role in re-programming? In order to study this problem, we discussed the distribution of the open state of chromatin in the process of pre-implantation embryo development and analyzed the regulation of transcription factor and the change of epigenetic modification in the open interval, and then on the influence factors of the cell fate in the early development. In this study, we first analyzed the basic properties of the whole genome and its dynamic changes in the open state of the chromatin in the pre-implantation embryo development, and found that, with the development of the embryo, more and more open regions come from the genome coding region, In the first time, the chromatin in the ICM was most active, and its open interval was the shortest and closest to the gene transcription initiation site. In addition, we scan and identify the binding sites of transcription factors in the open region, and it is found by cluster analysis that the transcription factor in the open region shows the specificity of the development stage. In the end, we discuss the change of the metabonomics signal in the development process with the epigenetic histone modification data, and finally the decision model of the cell fate is made. The above-mentioned studies have shown that the regulation of transcription factors and epigenetic modification in the early stage of life have an important effect on the reprogramming of the pre-implantation of the embryo. After we discussed the implantation of the embryo, the regulation and epigenetic modification of the transcription factor affects the differentiation of the embryonic stem cells into various cells and forms the function of the driving of the tissue. Although the transcription factor serves as an important protein molecule, the expression of the gene is regulated, but the binding site of the transcription factor in the genome is only a few bp to a few bp small intervals, Less than 2% of the region in the genome is enriched with more than 90% of the transcription factor binding site. There have been studies in Drosophila, Nematode and the human genome that are highly enriched in this type of transcription factor and are defined as HOT regions, such as the discovery of 304 HOT regions containing more than 15 transcription factors in the Nematode via the ChIP-seq data of 22 transcription factors. Although these transcription factors are highly concentrated in smaller sections of the genome, how they interact with the driving function and how they affect human diseases and cancer remains an unknown problem. In order to explore this problem, we developed a genome HOT region recognition algorithm based on the DHS data and verified the HOT region obtained in the experiment, and finally identified the HOT region in 154 representative cell lines. In addition, we describe the development and differentiation-related function of the HOT region from many aspects and analyze the dynamic changes of the HOT region and its epigenetic modification in the differentiation of the embryonic stem cells to the four terminal cells. On the basis of identifying and annotating the HOT region of the genome, we further explore the relationship between the HOT region and the human disease and the cancer. In combination with the data of the GWAS SNPs, we found that the disease and phenotype-related mutation sites tend to be specific in the HOT region of the pathology-related cells and tissues; we take the hematopoietic cell differentiation process as an example to discuss the specific changes of the disease and phenotype in detail, and at the same time, We analyze several important diseases and cancer, and further illustrate the significance of the HOT region. Finally, we explore the relationship between the HOT region and some of the carcinogenic mechanism, and it is found that the formation process of the tumor may need the specific HOT region of the tumor cell to regulate the expression of the related pathogenic gene. in ord to explore that function of transcription factor regulation and epigenetic modification in the course of the formation of the genealogy, we first analyze the nature and the change rule of transcription factor regulation and epigenetic modification in the process of pre-implantation embryo development based on the existing data analysis, and then, We identified and noted the hot spot region of the transcription factor binding site in the genome, and analyzed the property and the change rule of the transcription factor regulation and epigenetic modification in the hot spot region and the chromatin opening region, and finally, We discussed the relationship between hot spots and human diseases and cancer. This study focuses on the formation of the cell lineage, and focuses on the regulatory function of the non-coding region, which has added new contents to the study of the non-coding region of the genome, and also discussed and thought on the determinants of the cell fate.
【学位授予单位】:中国人民解放军军事医学科学院
【学位级别】:博士
【学位授予年份】:2017
【分类号】:Q75
本文编号:2458520
[Abstract]:In 2003, the completion of the human genome project marked a new milestone in the life science, and people opened the door to the study of the genome sequence. In recent years, with the development of sequencing technology, massive high-throughput data has emerged, including the ENCODE program, the ROADMAP program, the modENCODE program, and so on, such as gene expression, DNA methylation, histone modification, Sequencing data, such as the DNA high-sensitivity site, provides a wide range of research platforms, but how to extract useful information from this mass of data and to make a biological explanation remains a scientific problem. more and more studies have shown that the early stage of the genealogy, i. e. the development of pre-implantation embryos, has experienced a series of severe chromatin reprogramming events that not only mediate the restart of the gene transcription, but also shape the potency of the embryonic stem cells, And laid the foundation for the development of the later embryo. But what are the factors that play an important role in re-programming? In order to study this problem, we discussed the distribution of the open state of chromatin in the process of pre-implantation embryo development and analyzed the regulation of transcription factor and the change of epigenetic modification in the open interval, and then on the influence factors of the cell fate in the early development. In this study, we first analyzed the basic properties of the whole genome and its dynamic changes in the open state of the chromatin in the pre-implantation embryo development, and found that, with the development of the embryo, more and more open regions come from the genome coding region, In the first time, the chromatin in the ICM was most active, and its open interval was the shortest and closest to the gene transcription initiation site. In addition, we scan and identify the binding sites of transcription factors in the open region, and it is found by cluster analysis that the transcription factor in the open region shows the specificity of the development stage. In the end, we discuss the change of the metabonomics signal in the development process with the epigenetic histone modification data, and finally the decision model of the cell fate is made. The above-mentioned studies have shown that the regulation of transcription factors and epigenetic modification in the early stage of life have an important effect on the reprogramming of the pre-implantation of the embryo. After we discussed the implantation of the embryo, the regulation and epigenetic modification of the transcription factor affects the differentiation of the embryonic stem cells into various cells and forms the function of the driving of the tissue. Although the transcription factor serves as an important protein molecule, the expression of the gene is regulated, but the binding site of the transcription factor in the genome is only a few bp to a few bp small intervals, Less than 2% of the region in the genome is enriched with more than 90% of the transcription factor binding site. There have been studies in Drosophila, Nematode and the human genome that are highly enriched in this type of transcription factor and are defined as HOT regions, such as the discovery of 304 HOT regions containing more than 15 transcription factors in the Nematode via the ChIP-seq data of 22 transcription factors. Although these transcription factors are highly concentrated in smaller sections of the genome, how they interact with the driving function and how they affect human diseases and cancer remains an unknown problem. In order to explore this problem, we developed a genome HOT region recognition algorithm based on the DHS data and verified the HOT region obtained in the experiment, and finally identified the HOT region in 154 representative cell lines. In addition, we describe the development and differentiation-related function of the HOT region from many aspects and analyze the dynamic changes of the HOT region and its epigenetic modification in the differentiation of the embryonic stem cells to the four terminal cells. On the basis of identifying and annotating the HOT region of the genome, we further explore the relationship between the HOT region and the human disease and the cancer. In combination with the data of the GWAS SNPs, we found that the disease and phenotype-related mutation sites tend to be specific in the HOT region of the pathology-related cells and tissues; we take the hematopoietic cell differentiation process as an example to discuss the specific changes of the disease and phenotype in detail, and at the same time, We analyze several important diseases and cancer, and further illustrate the significance of the HOT region. Finally, we explore the relationship between the HOT region and some of the carcinogenic mechanism, and it is found that the formation process of the tumor may need the specific HOT region of the tumor cell to regulate the expression of the related pathogenic gene. in ord to explore that function of transcription factor regulation and epigenetic modification in the course of the formation of the genealogy, we first analyze the nature and the change rule of transcription factor regulation and epigenetic modification in the process of pre-implantation embryo development based on the existing data analysis, and then, We identified and noted the hot spot region of the transcription factor binding site in the genome, and analyzed the property and the change rule of the transcription factor regulation and epigenetic modification in the hot spot region and the chromatin opening region, and finally, We discussed the relationship between hot spots and human diseases and cancer. This study focuses on the formation of the cell lineage, and focuses on the regulatory function of the non-coding region, which has added new contents to the study of the non-coding region of the genome, and also discussed and thought on the determinants of the cell fate.
【学位授予单位】:中国人民解放军军事医学科学院
【学位级别】:博士
【学位授予年份】:2017
【分类号】:Q75
【参考文献】
相关期刊论文 前1条
1 Hongzhu Qu;Xiangdong Fang;;A Brief Review on the Human Encyclopedia of DNA Elements (ENCODE) Project[J];Genomics, Proteomics & Bioinformatics;2013年03期
,本文编号:2458520
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/2458520.html
教材专著