当前位置:主页 > 硕博论文 > 农业博士论文 >

猪基因组中拷贝数变异分析及多聚腺苷酸化位点的发掘

发布时间:2018-09-10 19:57
【摘要】:1.利用基因组重测序数据分析中外猪品种驯化过程中的拷贝数变异在野猪到家猪的驯化和进一步形成品种的过程中,遗传、变异和选择在不同程度上,对品种的形成以及品种差异的产生起着一定的作用。通过研究家猪与野猪及家猪不同品种之间基因组序列的差异,可以发现基因组上的变异。拷贝数变异(copy number variation,CNV)是指与生物正常的基因序列相比,基因组中所发生的长度范围介于1Kb至数Mb的变异,其形式包括重复、缺失及衍生出的复杂染色体微结构变异。本研究利用猪基因组重测序数据,对中外家猪13个品种共49个个体进行CNV分析,并分析拷贝数变异区域(CNVR)内相关基因的功能。同时,通过比较中外猪品种在驯化过程中产生的CNV,研究中外猪品种在驯化过程中受到选择的CNV及其相关基因,从而发现中外猪品种表型差异的遗传基础。为进一步解析猪基因组变异及猪品种改良奠定基础。主要结果如下:1.1利用生物信息学方法分析家猪驯化过程中的CNV。以本实验室通城猪基因组个体重测序数据及公共数据库下载的不同猪品种和野猪共49个个体重测序数据为材料,利用CNVseq和CNVnator软件分别进行CNV扫描。发现,从野猪到家猪驯化过程中产生的CNVRs共有3131个,其中拷贝数增加的区域有745个、减少的区域2364个,增加和减少都存在的区域有22个;根据CNVR在基因组上的位置,绘制出猪全基因组CNVs图谱。1.2利用实时荧光定量PCR方法验证CNVRs。利用实时荧光定量PCR方法对随机选取的28个区域进行拷贝数的验证,结果24个CNVRs的拷贝数与预测CNVRs内拷贝数的增加或减少相符合,验证符合率为86%。1.3 CNV的分布特征分析。对3131个CNVRs及上下游10Kb区域中的重复元件(SINE、LINE和LTR等)的数量及分布密度进行统计分析,结果发现在CNVRs及上下游10Kb的区域中,不同重复元件的分布密度均显著高于基因组中的平均水平,表明CNV常分布在基因组中重复元件附近,重复元件对CNV发生有重要影响。1.4家猪驯化过程中CNVR相关基因的功能分析。利用BioMart工具,在家猪驯化过程中产生的3131个CNVRs中发现1266个编码蛋白的基因,利用DAVID工具对基因进行功能富集分析,发现这些基因主要参与细胞粘附、GTP酶活性、细胞连接、免疫反应、嗅觉和MAPK通路等。1.5中外家猪在驯化过程中产生的差异cnvr及相关基因功能分析。通过分析发现,中国家猪中存在2278个cnvrs,欧洲家猪中存在1706个cnvrs。分别特异存在于中外家猪中的cnvrs有129个和147个,分别对cnvrs内相关基因进行功能富集分析,结果显示,中国家猪驯化过程中产生的特异cnvrs内相关基因的功能富集在免疫反应及生产性状上;而欧洲家猪驯化过程中产生的特异cnvrs内相关基因的功能富集在肌肉发育过程。2.利用转录组数据分析猪基因组多聚腺苷酸化位点多聚腺苷酸化是rna转录后修饰的一个重要过程,在mrna的转运及成熟mrna的翻译过程中起到关键作用。一个基因序列上多聚腺苷酸化位点(polyadenylationsite,pas)的数量以及每个pas的利用程度不同会引起选择性多聚腺苷酸化(alternativepolyadenylation,apa)的形成,从而导致同一个基因产生多个转录本,对基因的表达及功能的发挥产生重要影响。本研究利用猪转录组数据,从全基因组水平挖掘猪的pas,通过研究pas与基因表达量的关系,进一步研究pas对性状的影响。主要结果如下:2.1基于大规模转录组数据挖掘猪的多聚腺苷酸化位点。利用本实验室感染蓝耳病病毒前后的通城猪和大白猪中肺泡巨噬细胞的转录组数据及公共数据库下载的猪的转录组数据,包括12种组织、细胞及精子等的转录组数据,共计120亿个reads,,其中有194万个含有poly(a)或poly(t)的reads成功比对到基因组上,对这些reads进行pas挖掘,共得到28363个pass。2.2对pas位置进行注释。依据目前猪基因组注释文件中基因的位置信息,对本研究得到的28363个pass进行位置注释,共发现13033个(47%)pass位于7403个基因中,其中有7900个pass(61%)位于基因的3’utr,3441个pass(26%)位于基因的内含子区域,2187个pass(17%)位于基因的orf区域;利用所有转录组数据对猪的新转录本进行预测,并对剩余的15330个pass进行位置注释,结果表明,有6806个(24%)pass位于预测的新转录本内部。即:利用基因组注释文件和预测新的转录本信息,共发现19839个pass(70%)位于基因内部区域,8524个pass(30%)位于基因间区域。2.3pas在基因组和不同组织中的分布特性。基于猪基因组注释文件中基因的位置信息,对基因及其3’utr内的pas分布进行分析,结果显示,近41%的基因中存在至少两个以上的PASs,这些PAS可促使同一基因产生多个转录本;而对基因内及其3’UTR中相邻PASs间的距离分析发现,大多数PASs(45%)间的距离很近(1Kb);对3’UTR中的PAS与终止子间的距离分析发现,PAS在3’UTR上的位置具有较大差异,该距离的中值为307nt;通过对肝脏及睾丸组织中PAS进行挖掘,分别得到12777和14375个PASs,而两个组织相同的PAS仅有4752个,占总数量的21%,并且,两个组织中相同PAS的利用率差异很大,说明PAS具有组织特异性。2.4利用Pearson方法对PAS和基因表达量进行相关性分析。本研究利用源于不同雄性激素水平的睾丸和肝脏组织的转录组数据,对每个数据中基因的表达量、相应基因内PAS的数量及覆盖的reads数进行统计,利用Pearson方法对PAS与基因表达量进行相关性分析。结果表明,基因内PAS数量与基因表达量呈中度正相关(0.4r0.6,p0.01),PAS覆盖的reads数与基因表达量呈强正相关(0.6r0.8,p0.01)。2.5 PAS利用率对雄性激素水平及在细菌感染机体过程中的作用分析。依据睾丸和肝脏组织中雄性激素水平的高低,对不同数据中挖掘的PAS进行差异利用率分析,结果表明,肝脏中有272个PASs在低雄性激素组中的利用率显著高于在高雄性激素组中利用率(p0.05,|log2FC|≥1),对差异利用率PAS所在的109个基因进行功能富集分析发现,这些基因参与到了固醇及脂肪酸的代谢、甾类激素的合成和细胞色素P450的代谢等过程(p0.05);在睾丸中,有260个PASs的利用率具有极显著差异的(p0.05,|log2FC|≥1),相应的基因有163个,基因功能富集分析表明,很多基因同时参与精子形成和细胞周期等过程(p0.05)。对感染沙门氏菌前后的转录组数据进行PAS分析,发现38个PASs在感染后有较高的利用率,相关基因有28个,而41个PAS在感染前有较高的利用率,相关基因有26个(p0.05,|log2FC|≥1)。分别对感染前后高利用率PAS的相关基因进行功能富集分析,结果显示,感染后高利用率PASs的相关基因参与免疫应答和细胞因子的调控等过程,而感染前高利用率PASs的相关基因参与翻译等过程,与免疫反应无直接关系的过程(p0.05)。
[Abstract]:1. * using genome sequencing data to analyze copy number variation in domestication of Chinese and foreign pig breeds during * * domestication of wild boars and domesticated pigs, genetic variation and selection in different degrees play a certain role in the formation of varieties and the variety differences. Genomic variations can be found in genomic sequences among the same species. Copy number variation (CNV) refers to variations between 1Kb and Mb in length that occur in the genome in the form of repetition, deletion and derivation of complex chromosomal microstructural variations, as compared with normal biological gene sequences. This * * is to carry out CNV analysis of 49 individuals from 13 domestic and foreign pig breeds, and to analyze the functions of the related genes in the copy number variation region (CNVR). * by comparing the CNV produced by domestication and domestication of pig breeds, the CNV and related genes selected in domestication of Chinese and foreign pig breeds are studied. This * * * found the genetic basis of phenotypic differences between Chinese and foreign pig breeds. This lays the foundation for further analysis of pig genome variation and pig breed improvement. The main results are as follows: 1.1 bioinformatics method is used to analyze CNV. in domestication of pigs domestically. CNVRs were scanned by CNVseq and CNVnator software in 49 individuals of species and wild boar. A total of 3131 CNVRs were detected during domestication from wild boar to domestic boar, of which 745 were increased, 2364 were decreased, and 22 were increased or decreased according to CNVR. The whole genome CNVs map of pig was drawn up *.1.2 was validated by real-time fluorescence quantitative PCR. CNVRs. was validated by real time fluorescence quantitative PCR for 28 selected regions. The results showed that the copy number of 24 CNVRs was consistent with the increase or decrease of copy number in CNVRs, and the rate of verification was 86%.1.3 CNV. Distribution characteristics were analyzed. The number and density of repetitive elements (SINE, LINE and LTR) in 3131 CNVRs and 10 Kb upstream and downstream regions were statistically analyzed. The results showed that the distribution density of different repetitive elements in CNVRs and 10 Kb upstream and downstream regions was significantly higher than the average level in the genome, indicating that CNV was normally distributed in the genome. Functional analysis of CNVR-related genes during domestication in pigs. Using BioMart tools, 1266 genes encoding proteins were found in 3131 CNVRs produced during domestication in domestic pigs. The DAVID tools were used to enrich and analyze the functions of these genes. The results showed that these genes were mainly involved in fineness. Differential CNVRs and related gene functions in domestic and foreign pigs during domestication were analyzed. 2 278 CNVRs were found in Chinese pigs and 1 706 CNVRs were found in European pigs. The results showed that the functions of the genes related to CNVRs produced during domestication in Chinese pigs were enriched in immune response and production traits, while those of the genes related to CNVRs produced during domestication in European pigs were enriched in muscle development. Group data analysis * polyadenylation of porcine genome polyadenylation is an important process in post transcriptional modification of RNA. It plays a key role in mRNA translocation and maturation of mRNA translation. The number of polyadenylation sites (polyadenylationsite, PAS) on a gene sequence and the utilization level of each PAS will be different. The formation of alternativepolyadenylation (APA), which results in the production of multiple transcripts of the same gene, has an important effect on the expression and function of the gene. This * * * uses pig transcriptome data to dig pig PAS from the whole genome level, and further studies the relationship between PAS and gene expression. The effects of PAS on traits were studied. The main results are as follows: 2.1 * based on large-scale transcriptome data, the polyadenylation sites of pigs were extracted. * * the transcriptome data of alveolar macrophages in Tongcheng pig and large white pig before and after infection with the blue ear disease virus and the transcriptome data of pigs downloaded from the public database, including 12 tissues and cells, were used. The transcriptome data of spermatozoa and 12 billion reads, 1 million 940 thousand of which have poly (a) or poly (T) reads were successfully compared to the genome, and PAS mining for these reads, a total of 28363 pass.2.2 were selected to annotate the position of PAS * according to the location information of the gene in the genome of the pig genome, 28363 of this study was obtained. Annotated by pass, 13033 (47%) pass were found in 7403 genes, including 7900 pass (61%) located at 3 'UTR of the gene, 3441 pass (26%) located in the intron region of the gene, 2187 pass (17%) located in the ORF region of the gene, * using all transcriptome data to predict the new transcript of the pig, and the remaining 15330 pass. Location annotation showed that 6806 (24%) pass were located within the predicted new transcript. That is, using genomic annotation files and predicting new transcript information, 19839 pass (70%) located in the inner region of the gene were found, and 8524 pass (30%) located in the intergenic domain.2.3pas in the genome and in different tissues. Location information of genes in genomic annotation files was analyzed for the distribution of PAS in genes and their 3'utr. The results showed that nearly 41% of genes contained at least two or more PASs, which could induce multiple transcripts of the same gene. Distance analysis of the genes and their adjacent PASs in 3'UTR revealed that most of the PASs (45%) were between. The distances between PAS and terminators in 3'UTR were very close (1Kb); the median of the distances between PAS and terminators was 307nt, and 12777 and 14375 PASs were obtained respectively from liver and testis tissues, while only 4752 PAS were found in the same tissues, accounting for 21% of the total. In this study, we used transcriptome data from testicular and liver tissues with different levels of androgen to analyze the expression of PAS and the number of PAS in each data. Pearson method was used to analyze the correlation between PAS and gene expression. The results showed that there was a moderate positive correlation between PAS and gene expression (0.4r0.6, p0.01), and a strong positive correlation between PAS and gene expression (0.6r0.8, p0.01). According to the level of androgen in testis and liver tissues, the utilization rate of PAS in low androgen group was significantly higher than that in high androgen group (p0.05, | log2FC | 1), and the difference was significant. Functional enrichment analysis of 109 PAS genes showed that these genes were involved in steroid and fatty acid metabolism, steroid hormone synthesis and cytochrome P450 metabolism (p0.05); in the testis, 260 PASs utilization rates were significantly different (p0.05, | log2FC | 1), 163 corresponding genes, genes. Functional enrichment analysis showed that many genes were involved in both spermatogenesis and cell cycle (p0.05). PAS analysis of the transcriptome data before and after Salmonella infection showed that 38 PASs had higher utilization rate after infection, 28 related genes, 41 PAS had higher utilization rate before infection, 26 related genes (p0.05, | lo) G2FC | 1. Functional enrichment analysis of PAS-related genes before and after infection showed that PAS-related genes were involved in immune response and cytokine regulation, while PAS-related genes before infection were involved in translation, which had no direct relationship with immune response (p0.05). .05).
【学位授予单位】:华中农业大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:S828

【相似文献】

相关期刊论文 前1条

1 宋德秀,于建康;几种层析柱吸附多聚腺苷酸和与多聚腺苷酸相连的信使核糖核酸能力的比较[J];动物学杂志;1979年03期

相关会议论文 前1条

1 万谅;孙雨;付永贵;徐安龙;;高通量测序技术在可选择性多聚腺苷酸化研究中的应用[A];生命科学——专题:RNA研究的新技术和新方法(第26卷第3期)[C];2014年

相关博士学位论文 前1条

1 王洪洋;猪基因组中拷贝数变异分析及多聚腺苷酸化位点的发掘[D];华中农业大学;2016年

相关硕士学位论文 前1条

1 段江波;人类基因PolyA位点预测[D];华中科技大学;2008年



本文编号:2235464

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/nykjbs/2235464.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户548d1***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com