Shewanella loihica PV-4基因组蛋白质编码基因重注释
发布时间:2018-06-28 08:24
本文选题:基因预测 + 基因组重注释 ; 参考:《山东师范大学》2016年硕士论文
【摘要】:对原核生物基因组蛋白编码基因的预测工作已经持续了近30年。由于缺少内含子,人们一直以为原核生物基因预测相对简单,然而,越来越多的研究表明不同的基因预测算法得到的基因预测结果差别较大,造成预测结果假阳性和假阴性的不断积累,导致生物信息数据库中蛋白编码基因普遍存在错误注释,影响了数据库的使用质量,甚至会导致错误研究结论的产生。因此,本课题通过将原核生物基因组过注释蛋白质编码基因重注释算法与基因从头预测算法结合,提出一种原核生物基因组蛋白质编码基因重注释算法,并实际应用于在生物能源和环境治理中具有重要应用的Shewanella loihica PV-4菌株基因组中,最终得到了1个过注释基因和30个欠注释基因,基于功能已知的蛋白质编码基因得到的预测效率评价指数Ac、MCC、AUC分别为99.93%、0.9986和0.9999。基于BLAST、COG等方法对预测得到的30个新基因进行功能预测,有6个欠注释蛋白质编码基因得到明确的生物学功能,2个欠注释基因归为COG分类中的“R”类。在6个有明确生物学功能的欠注释基因中,有2个磷酸核糖甘氨酰胺转甲酰基酶2,2个葡萄糖-1-磷酸胸苷酰转移酶,膜蛋白和转座酶各1个,这些基因可能在离子交换和蛋白修饰等方面起到重要作用。进一步分析表明本文构建的重注释算法准确、可靠。在此基础上,将该算法拓展应用于其它9种希瓦氏菌基因组,得到了64个过注释基因和1036个欠注释基因,进一步的功能分析发现有261个欠注释基因具有明确生物学功能,有259个欠注释基因具有COG功能分类。在有明确功能的261个欠注释基因中,“transposase”(转座酶)类居多,有123个,约占明确功能基因总数的47%。“integrase”(整合酶)类有16个,“dehydrogenase”(脱氢酶类)有5个,“cytochrome C”细胞色素C类有3个等等,这些功能基因在离子交换和信号传导等方面起到不可或缺的作用。在259个具有COG分类的欠注释基因中,有182个新基因与细胞色素C相关,表明这些基因与离子传递及蛋白修饰相关;30个新基因与鞭毛相关基因有关,表明这些基因与细胞运动密切相关;48个新基因与趋化蛋白相关,表明这些基因与细胞运动,信号传导密切相关。因而,本文对S.loihica PV-4菌株和部分Shewanella菌的深入研究提供了可靠的数据支持,也为今后原核生物基因组注释提供了新思路。
[Abstract]:The prediction of prokaryotic genome protein coding genes has been going on for nearly 30 years. Because of the lack of introns, people always think that prokaryote gene prediction is relatively simple. However, more and more studies show that different gene prediction algorithms have different gene prediction results. The accumulation of false positive and false negative results leads to the misinterpretation of protein coding genes in the biological information database, which affects the quality of the use of the database, and even leads to the production of the wrong research conclusions. Therefore, a prokaryotic genome protein coding gene reannotation algorithm is proposed by combining the reannotation algorithm of prokaryotic genome over-annotated protein coding gene with the gene ab initio prediction algorithm. It was applied to the genome of Shewanella loihica PV-4, which has important applications in bioenergy and environmental management. Finally, one over-annotated gene and 30 underannotated genes were obtained. The predictive efficiency evaluation index (AUC) based on protein coding genes with known function was 99.93 and 0.9986, respectively. Based on BLASTG-COG and other methods, the predicted 30 new genes were predicted. Six unannotated protein coding genes had clear biological functions, and two under-annotated genes were classified as "R" in COG classification. Among the 6 underannotated genes with specific biological functions, there were two phosphoglycosaminidase 2,2 glucose-1-phosphothymidine transferase, one membrane protein and one transposyltransferase, respectively. These genes may play an important role in ion exchange and protein modification. Further analysis shows that the algorithm is accurate and reliable. On this basis, the algorithm was extended to the genome of 9 other strains of Shiva, and 64 genes and 1036 unannotated genes were obtained. Further functional analysis showed that there were 261 unannotated genes with definite biological function. There are 259 underannotated genes with COG functional classification. Of the 261 unannotated genes with specific functions, "transposase" (transposing enzymes) were the most common, accounting for about 47.7% of the total number of specific functional genes. There were 16 "integrase" and 5 "dehydrogenase" genes. There are three kinds of cytochrome C in "cytochrome C" and so on. These functional genes play an indispensable role in ion exchange and signal transduction. Of the 259 under-annotated genes with COG classification, 182 new genes were associated with cytochrome C, indicating that these genes were related to ion transport and protein modification, and 30 new genes were associated with flagellate-related genes. These genes are closely related to cell movement and 48 new genes to chemoattractant proteins, indicating that these genes are closely related to cell movement and signal transduction. Therefore, this paper provides reliable data for the further study of S.loihica PV-4 and some Shewanella strains, and provides a new idea for the annotation of the genome of prokaryotes in the future.
【学位授予单位】:山东师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:Q78
,
本文编号:2077360
本文链接:https://www.wllwen.com/kejilunwen/jiyingongcheng/2077360.html
最近更新
教材专著