二三代基因组混合组装流程的搭建与香菇基因组精细图谱的获得
本文选题:Illumina测序 切入点:单分子实时测序 出处:《昆明理工大学》2017年硕士论文
【摘要】:近年来,三代测序技术发展迅猛,本文将三代测序技术与二代测序结合,充分利用三代测序长读长以及二代测序高精度的优势,降低高重复、高杂合等区域的组装难度,得到高质量的香菇基因组图谱。通过对香菇基因组组装分析,归纳出一套完整的组装流程,对于复杂基因组的组装具有借鉴意义。最终,使用混合组装流程得到的香菇组装结果的总长为45.7Mb,contigN50为630Kb,与二代组装结果相比,我们得到的香菇基因组无论从完整性和连续性均远远胜出。结合abinition、Homology、EST预测的方法,最终我们共得到12511个基因,基因模型的数目为14616个,每个基因的平均长度为1952bp。使用NR数据库、Swiss-Prot数据库、GO数据库、Pfam数据对基因序列进行功能注释,其中有11255个基因能够被功能注释,1256个基因没有被功能注释。通过对于香菇基因组重复序列分析,在香菇基因组中,有21.56%的序列为重复序列,其中。反转录转座子的含量约占16.48%,其中,Gypsy家族占全部基因组的12.00%,进一步研究发现,这些转座子的行程时间较短。非编码RNA在生物体内具有重要的作用,通过使用不同软件和数据库,在香菇基因组中,共找到tRNA317个,rRNA30 个,其中,8s、18s、28s 均为 10 个,snoRNA14 个,snRNA35 个。通过二三代基因组比较,我们发现它们之间存在着很多“gap”区域,通过分析发现,很多gap区域内部含有大量的基因和重复序列,正是因为高含量的重复序列,导致二代组装结果不完整,从而缺失了很多片段。但是,基于二三代混合组装的方法较好地解决了重复序列的问题,使最终的组装结果更加完整。碳水化合物是自然界中最为广泛、数量最多的一类重要化合物,依据它的功能,可以划分为糖苷水解酶类、糖基转移酶类、多糖裂解酶类以及糖酯酶类,在香菇基因组中,我们共鉴定出来472个碳水化合物活性酶,其中,多个基因家族与糖类、纤维素类、半纤维素等碳水化合物的讲解利用有关,表明香菇在利用工业废弃物,如蔗渣、秸秆等,具有广阔的前景。
[Abstract]:In recent years, the third generation sequencing technology has developed rapidly. In this paper, we combine the third generation sequencing technology with the second generation sequencing technology, make full use of the advantages of the long reading length of the third generation sequencing and the high precision of the second generation sequencing, and reduce the difficulty of assembling the regions such as high repetition, high heterozygosity and so on. A high-quality genome map of Lentinus edodes was obtained. By analyzing the genome assembly of Lentinus edodes, a set of complete assembly process was concluded, which is useful for the assembly of complex genome. The total length of the Lentinus edodes assembled by using the mixed assembly process was 45.7 Mbcontig N50 = 630Kb. Compared with the second generation assembly results, our obtained Lentinus edodes genome was far superior in terms of integrity and continuity. Finally, we got 12511 genes, the number of gene models was 14616, the average length of each gene was 1952bp.Using NR database Swiss-Prot database go database / Pfam data to annotate the gene sequence. Among them, 11255 genes could be annotated by function, 1256 genes were not annotated by function. By analyzing the repeat sequence of Lentinus edodes genome, 21.56% of the sequences were repeats in the genome of Lentinus edodes. Among them, the content of retrotransposons is about 16.48, and the Gypsy family accounts for 12.00 of the whole genome. Further studies show that these transposons have a shorter travel time. Noncoding RNA plays an important role in organisms. By using different software and databases, a total of 30 tRNA317 rRNAs were found in the genome of Lentinus edodes, of which 18 srRNAs were 10 snoRNAs and 14 snRNAs were found. By comparing the genomes of the second and third generation, we found that there are many "gap" regions between them. Many gap regions contain a large number of genes and repeat sequences, which lead to incomplete second-generation assembly and many missing fragments because of the high content of repeat sequences. The method based on the second and third generation hybrid assembly method solves the problem of repeated sequences and makes the final assembly result more complete. Carbohydrates are one of the most extensive and abundant important compounds in nature, depending on their functions. It can be divided into glycoside hydrolases, glycosyltransferases, polysaccharides lyase and glycosylesterases. In the genome of Lentinus edodes, we have identified 472 carbohydrate active enzymes. The explanation and utilization of carbohydrates such as hemicellulose indicate that lentinus edodes are making use of industrial wastes such as bagasse and straw.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:S646.12
【参考文献】
相关期刊论文 前10条
1 王思芦;汪开毓;陈德芳;;食用真菌多糖免疫调节作用及其机制研究进展[J];动物医学进展;2012年11期
2 王谦;贾震;;食药用真菌的药理作用研究进展[J];医学研究与教育;2010年05期
3 ;Genetic diversity,geographic differentiation and evolutionary relationship among ecotypes of Glycine max and G. soja in China[J];Chinese Science Bulletin;2009年23期
4 尹向前;;香菇多糖的抗肿瘤活性研究[J];数理医药学杂志;2009年03期
5 骆志刚;方小永;丁凡;;DNA序列拼接的研究进展及挑战[J];计算机工程与科学;2007年08期
6 马学萍;段云晖;孔宝华;李丹;;食用菌提取物对烟草花叶病毒的抑制作用[J];云南农业大学学报;2007年02期
7 邓超;邬敏辰;;茶树菇深层发酵产物与子实体营养成分的分析[J];安徽农业科学;2007年05期
8 ;Genetic diversity in Chinese modern wheat varieties revealed by microsatellite markers[J];Science in China(Series C:Life Sciences);2006年03期
9 ;Genetic diversity of rice cultivars (Oryza sativa L.) in China and the temporal trends in recent fifty years[J];Chinese Science Bulletin;2006年06期
10 陈明;真菌多糖抗肿瘤研究的进展[J];食用菌;1993年06期
,本文编号:1666973
本文链接:https://www.wllwen.com/shoufeilunwen/zaizhiyanjiusheng/1666973.html