癌症易感基因数据库构建及其拷贝数变异分析
发布时间:2017-12-27 15:09
本文关键词:癌症易感基因数据库构建及其拷贝数变异分析 出处:《安徽大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 癌症易感基因 数据库 拷贝数变异 基因表达 网络模块
【摘要】:基因突变按照其发生的部位可以分为体细胞突变和生殖细胞突变。体细胞突变只能在体细胞中传递,不能直接遗传下代,而生殖细胞突变则会代代传递下去。携带生殖细胞突变或表观遗传突变,引起癌症发生风险增加的基因,我们称之为癌症易感基因(cancer predisposition gene,CPG)。对癌症易感基因的鉴定、识别及相关生物学机制的研究可以帮助实现癌症的早预防、早诊断和早治疗,同时也有助于癌症病因寻找、发病机制研究和相关药物研发。大部分癌症易感基因与肿瘤抑制基因的作用机制类似,因基因功能丧失,而导致癌症发生。少数易感基因则与癌基因类似,是通过突变获得新的功能,进而使细胞周期紊乱而引发癌症。在过去的几十年里,随着高通量技术,特别是全基因组突变分析(包括外显子测序和全基因组测序等)的不断发展和逐步被应用,越来越多的癌症易感基因被发现。然而,这些基因及其功能等信息是分散的,目前还没有一个关于癌症易感基因的系统性数据库。我们通过收集并整理不同来源的癌症易感基因,构建了一个较全面的癌症易感基因数据库资源。为了进一步对癌症易感基因的拷贝数变异进行分析,我们还在范癌(pan-cancer)样本中研究了癌症易感基因拷贝数变异与基因表达之间的关系。全文的主要工作概括如下:1.构建癌症易感基因数据库。为了提供一个完整的用于探索癌症易感基因及其分子机制的资源,我们首先从五个来源收集了数据,分别是Rahman's data,PubMed,GeneReview,在线人类孟德尔遗传基因数据库和GeneRIF(Gene Reference Into Function)。接着,通过文献阅读和分析,总共收集到827个人癌症易感基因(包括724个蛋白质编码基因,23个非编码基因和80个目前NCBI中没有给出具体信息的基因),637个大鼠和658个小鼠的人同源癌症易感基因。为了更好的理解这些癌症易感基因,我们利用文本挖掘的方法系统地收集了每个基因的基本信息、基因表达、甲基化位点、翻译后修饰、生殖细胞突变、相互作用、通路信息和药物信息等8个方面的注释信息。在此基础上,我们构建了癌症易感基因数据库网站 dbCPG(http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp)。用户可以非常方便的在该数据库网站上进行数据查询、浏览、上传与下载等操作。最后,为了评估724个编码蛋白质的人癌症易感基因功能,我们用KOBAS和DAVID两个在线工具进行富集分析,并用GenRev中的Klein-Ravi算法进行网络分析。作为第一个癌症易感基因数据库,dbCPG不仅是对已有研究结果的归纳整理,也为癌症研究人员提供了一个更加容易获取数据资源的平台。2.癌症易感基因的拷贝数变异研究。根据"two-hit"假说,癌症发生是生殖细胞和体细胞突变不断积累的结果。因此,在癌症生物学中,综合分析生殖细胞突变和体细胞突变对鉴定基因和相关分子通路至关重要。已有研究表明癌症的易感性可能与癌症易感基因的拷贝数变异有关。为了系统地分析癌症易感基因的拷贝数变异,我们在范癌样本中研究易感基因体细胞拷贝数变异与表达改变的关系。首先,基于癌症基因组图谱数据库(TCGA)中的拷贝数变异数据,发现dbCPG数据库中有729个易感基因有明确地拷贝数变异信息。对这些基因进一步分析发现有128个易感基因的拷贝数缺失(CNL)样本数是拷贝数增加(CNG)样本数的两倍。针对这128个基因,我们将TCGA中的表达数据与拷贝数缺失数据结合分析,得到49个拷贝数缺失且表达降低的癌症易感基因。统计发现其中有5个基因在至少50个肿瘤样本中拷贝数缺失和表达下调变化具有一致性,分别是MT4P(216个样本),PTEN(143个),MCPH1(86个),SMAD4(63个)和MINPP1(51个)。这说明在癌症发生过程中拷贝数缺失可能是导致基因表达发生改变的驱动力。对这49个基因进行网络分析,我们发现在提取到的子网络中各基因之间联系较为紧密,进而说明这些基因在癌症发生过程中可能有相似的生物学机制。这是第一次在范癌样本中研究癌症易感基因拷贝数缺失与基因表达下调的关系,尽管有一些不足,但以上结果将会帮助人们更加直观理解易感基因在癌症发生过程中的生物学功能。
[Abstract]:Gene mutation can be divided into somatic mutation and germ cell mutation according to the location of its occurrence. Somatic mutation can only be transmitted in somatic cells, which can not be directly inherited from the next generation, and the mutation of the germ cell will be passed on in the generation. The gene that carries germ cell mutation or epigenetic mutation and increases the risk of cancer is called cancer predisposition gene (CPG). Identification, identification and related biological mechanisms of cancer susceptibility genes can help to achieve early prevention, early diagnosis and early treatment of cancer, and also contribute to cancer etiology finding, pathogenesis research and related drug research and development. Most cancer susceptibility genes are similar to the mechanism of tumor suppressor genes, resulting in cancer because of loss of gene function. A small number of susceptible genes are similar to oncogenes, which can get new functions by mutation and cause cell cycle disorder to cause cancer. Over the past decades, with the continuous development and gradual application of high-throughput technology, especially the whole genome mutation analysis, including exon sequencing and genome sequencing, more and more cancer susceptibility genes have been found. However, the information of these genes and their functions is scattered, and there is not yet a systematic database on cancer susceptible genes. We build a more comprehensive database of cancer susceptibility genes by collecting and sorting out cancer susceptibility genes from different sources. In order to further analyze the copy number variation of cancer susceptibility genes, we also studied the relationship between copy number variation and gene expression in cancer samples (pan-cancer). The main work of this paper is summarized as follows: 1. the construction of cancer susceptibility gene database. In order to provide a complete for susceptible gene and to explore the molecular mechanism of cancer, we collected data from five sources, namely Rahman's data, PubMed, GeneReview, online human Mendel gene database and GeneRIF (Gene Reference Into Function). Then, through literature reading and analysis, a total of 827 cancer susceptibility genes (including 724 protein coding genes, 23 non coding genes and 80 genes that did not give specific information in NCBI) were collected, and the homologous cancer susceptible bases of 637 rats and 658 mice were collected. In order to better understand these cancer susceptibility genes, we use the method of text mining system to collect the basic information of each gene, gene expression, methylation, post-translational modification, germ cell mutation, interaction, channel information and drug information such as 8 aspects of the annotation information. On this basis, we built the cancer susceptibility gene database website dbCPG (http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp). Users can be very convenient to query, browse, upload and download data on the database website. Finally, in order to evaluate the function of 724 protein coding human cancer susceptibility genes, we used KOBAS and DAVID two online tools for enrichment analysis, and used Klein-Ravi algorithm in GenRev to carry out network analysis. As the first cancer susceptibility gene database, dbCPG is not only a generalization of the existing research results, but also a platform for cancer researchers to get data resources more easily. 2. study on the copy number variation of cancer susceptible genes. According to the "two-hit" hypothesis, the occurrence of cancer is the result of continuous accumulation of mutagenesis of germ cells and somatic cells. Therefore, in cancer biology, the comprehensive analysis of germ cell mutation and somatic mutation is essential for the identification of genes and related molecular pathways. Studies have shown that the susceptibility of cancer may be associated with the copy number variation of cancer susceptible genes. In order to systematically analyze the copy number variation in cancer susceptibility genes, we study the relationship between the susceptible cell genomic copy number variation and expression changes in cancer samples in the van. First, based on the copy number variation data in the cancer genome map database (TCGA), we find that 729 susceptible genes in dbCPG database have a clear copy number variation information. Further analysis of these genes found that the number of copy number deletion (CNL) samples of 128 susceptible genes was two times as much as the number of copies (CNG). In view of these 128 genes, we combine the expression data in TCGA with copy number missing data to get 49 copies of cancer susceptible genes with reduced copy number and reduced expression. Statistics showed that 5 genes were consistent in the at least 50 tumor samples, which were MT4P (216 samples), PTEN (143), MCPH1 (86), SMAD4 (63) and MINPP1 (51). This suggests that the deletion of the number of copies in the process of cancer may be the driving force that causes changes in gene expression. Based on the network analysis of these 49 genes, we find that the genes are closely related in the extracted subnetworks, which indicates that these genes may have similar biological mechanisms in the process of cancer occurrence. This is the first time to study the relationship between loss of copy number and down-regulation of gene expression in cancer samples. Although there are some shortcomings, the above results will help people understand intuitively the biological function of susceptible genes in the process of cancer occurrence.
【学位授予单位】:安徽大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R73;Q811.4
【相似文献】
相关期刊论文 前2条
1 吴柏林;;预测性遗传检查——个体化医疗的重要基石[J];科学;2003年02期
2 ;[J];;年期
相关重要报纸文章 前1条
1 麦迪信;最常见癌症易感基因TGFBR1*6A被发现[N];医药经济报;2003年
相关硕士学位论文 前1条
1 魏然;癌症易感基因数据库构建及其拷贝数变异分析[D];安徽大学;2017年
,本文编号:1342199
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/1342199.html