细菌必需基因的预测及进化特征的分析
本文选题:必需基因 + 组成特征 ; 参考:《电子科技大学》2016年硕士论文
【摘要】:必需基因在细菌生存中扮演了一个相当重要的角色,其编码的蛋白质保证了细菌的正常生存和繁殖。在确定了致病菌的必需基因之后,我们可以将其当作治病药物的靶标,从而达到治疗疾病的效果;细菌必需基因的理论研究还有助于我们理解生命的起源和进化。所以,预测细菌的必需基因越来越成为生物信息学的研究重点。在预测细菌必需基因的方法中,实验的方法无疑是最准确的,但是实验周期长,操作麻烦,并且花费巨大,所以到目前为止只有很少的菌种的必需基因被确定出来,因此理论的方法越来越受到重视。本文就以细菌的必需基因为主要的研究对象,采用基于组成特征的理论方法来预测细菌的必需基因。我们首先根据注释文件从大肠杆菌的基因组序列中提取出其组成特征。然后用支持向量机(SVM)和主成分回归(PCR)的方法对组成变量进行分类处理,并用曲线下面积AUC的值来衡量分类器的效果。这也是第一次将主成分回归的方法用于细菌必需基因的预测。得出SVM的AUC为0.83,PCR结果为0.87。接着我们又对两种方法进行改进,在支持向量机方法之前,将组成变量进行特征分析(ttSVM),筛除必需基因和非必需基因没有明显差异的变量。对于主成分回归,加上了核函数(KPCR),提高了其对非线性特征的分类能力。改进后,ttSVM结果最高达0.87,KPCR则为0.84。接着我们将其他所有的已经实验确定必需基因的物种用该四种方法处理,AUC最高达到0.95。最后,我们用AUC大于0.8的物种,建立预测模型,构建了一个免费的网上服务IBEG(http://cefg.uestc.edu.cn/ibeg/),利用该服务,研究人员不但可以运用不同的方法预测未知基因的必需性,也可以对比不同方法的优劣。此外,我们还从功能性基因和水平转移基因两方面,分别对不同物种的必需基因、高密码子使用基因以及高表达基因进行了对比分析。在功能性基因中,必需基因所占的比例最多,说明必需基因中具有功能的基因比较多,功能越是对生命体重要的基因,进化越保守;在水平转移基因中,必需基因所占的比例也是最多,说明必需基因的功能中有一些管家基因,从而容易发生水平转移。综上所述,本文在组成特征上对细菌必需基因的预测做了新方法的处理,增加了新的组成特征,并对其在进化方面做了的研究。但是还有一些问题,需要进一步深入研究,并进一步完善。
[Abstract]:Essential genes play a very important role in bacterial survival, and the proteins they encode guarantee the normal survival and reproduction of bacteria. After we have identified the necessary genes of pathogenic bacteria, we can use them as targets of medicine to cure diseases, and the theoretical study of essential genes of bacteria can also help us to understand the origin and evolution of life. Therefore, predicting the essential genes of bacteria has become the focus of bioinformatics. Of the methods used to predict bacterial essential genes, the experimental method is undoubtedly the most accurate, but it is so long, cumbersome and costly that only a few essential genes have been identified so far. Therefore, more and more attention has been paid to the theoretical method. In this paper, the essential genes of bacteria were used as the main research object, and the essential genes of bacteria were predicted by using the theory method based on component characteristics. We first extracted the composition of Escherichia coli from the genome sequence according to the annotated document. Then support vector machine (SVM) and principal component regression (PCR) are used to classify the component variables, and the effect of the classifier is evaluated by the value of AUC under the curve. This is the first time that the principal component regression method has been used to predict bacterial essential genes. The AUC of SVM is 0. 83% and the PCR result is 0. 87. Then we improve the two methods. Before the support vector machine (SVM) method, the component variables are analyzed by feature analysis to screen the variables which have no obvious difference between the essential gene and the non-essential gene. For principal component regression, kernel function KPCRN is added to improve its ability to classify nonlinear features. The result of the improved vector machine (SVM) is 0.87kPCR and 0.84respectively. We then treated AUC with all other species that had experimented with essential genes to a maximum of 0. 95. Finally, we built a prediction model for species with AUC greater than 0.8, and we built a free online service, IBEGEGG: r / cefg.uestc.edu.cnr.ibegrr, which allows researchers not only to use different methods to predict the need for unknown genes. It is also possible to compare the advantages and disadvantages of different methods. In addition, we compared the essential genes, high codon usage genes and high expression genes in different species from functional genes and horizontal transfer genes. Among functional genes, essential genes account for the largest proportion, indicating that there are more functional genes in essential genes, and the more important genes for life, the more conservative evolution is; in horizontal transfer genes, The proportion of essential genes is also the highest, indicating that there are some housekeeping genes in the function of essential genes, which makes it easy to transfer horizontally. To sum up, a new method is proposed to predict the essential genes of bacteria in terms of composition characteristics, and the new characteristics are added, and the evolution of these genes is also studied. However, there are still some problems that need to be further studied and further improved.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:Q933
【相似文献】
中国期刊全文数据库 前3条
1 叶远浓;郭锋彪;;微生物必需基因的理论研究现状[J];遗传;2012年04期
2 沈露露;杜敏;林兴凤;蔡婷;王大勇;;嗅觉神经元AWA功能必需基因以胰岛素信号依赖的方式调控秀丽线虫的衰老(英文)[J];Neuroscience Bulletin;2010年02期
3 ;[J];;年期
中国重要会议论文全文数据库 前2条
1 张春霆;;细菌必需基因研究与最小基因组[A];第五届全国生物信息学与系统生物学学术大会论文集[C];2012年
2 郭锋彪;宁绿文;黄健;林昊;张会雄;;新洋葱伯克霍尔德氏菌AU-1054菌株的三条染色体上必需基因的异常分布[A];中国的遗传学研究——遗传学进步推动中国西部经济与社会发展——2011年中国遗传学会大会论文摘要汇编[C];2011年
中国博士学位论文全文数据库 前2条
1 叶远浓;细菌必需基因团簇模型及最小基因集构建[D];电子科技大学;2015年
2 林岩;微生物必需基因数据的分析[D];天津大学;2010年
中国硕士学位论文全文数据库 前4条
1 林丹;多种微生物功能基因的预测和分析[D];电子科技大学;2014年
2 邓炎炎;细菌必需基因的预测及进化特征的分析[D];电子科技大学;2016年
3 罗森;细菌必需基因自训练算法的研究及实现[D];电子科技大学;2016年
4 窦运涛;原核生物基因识别程序ZCURVE 1.02的研发和微生物必需基因的分析[D];天津大学;2005年
,本文编号:2008980
本文链接:https://www.wllwen.com/kejilunwen/jiyingongcheng/2008980.html