细菌sRNA靶标数据库3.0构建及其功能注释研究
发布时间:2018-01-16 04:24
本文关键词:细菌sRNA靶标数据库3.0构建及其功能注释研究 出处:《中国人民解放军军事医学科学院》2016年博士论文 论文类型:学位论文
【摘要】:细菌s RNA是与多种生物学过程相关的重要调控RNA,例如新陈代谢、群体感应(quorum sensing)、生物膜形成、铁元素调控和毒力调节等。它们主要通过与靶标m RNA或者蛋白质结合发挥功能,因此,系统收集实验证实的细菌s RNA靶标,并开发相应的数据库管理分析系统,不仅可为深入了解s RNA功能和作用机制提供帮助,也可为开发细菌s RNA靶标预测模型提供支持。目前与细菌s RNA相关的数据库主要有s RNAMap、s RNAdb、Rfam、Regulon DB、NPInter、BSRD和s RNATar Base,这些数据库在数据收集与数据注释方面各有侧重。例如s RNAMap是一个革兰氏阴性细菌s RNA数据库,包含了来自70个微生物基因组的397个s RNA、62个s RNA转录因子和60个s RNA靶标。此外数据库还提供了s RNA的二级结构预测、s RNA表达条件和s RNA表达水平等信息。s RNAdb则是一个收集革兰氏阳性细菌s RNA的数据库平台,该数据库包括了558个革兰氏阳性细菌基因组和质粒、671个实验证实的细菌s RNA以及9993个预测的细菌s RNA,并可以对用户输入的s RNA数据进行分析,寻找其同源s RNA。Rfam数据库主要收集来自真核与原核生物的各种nc RNA家族,并提供二级结构信息,在细菌s RNA方面,主要收集相关s RNA序列信息,不涉及s RNA靶标。数据库Regulon DB则是一个关于大肠杆菌K-12中转录调控网络的数据库,其中包括转录单元(transcription units,TUs)、启动子和转录调控子(transcriptional regulators,TRs)等信息。该数据库收录了110个s RNA和227对s RNA-target相互作用,其中包含53个靶标m RNA结合位点。NPInter主要收集实验证实的非编码RNA(排除t RNA和r RNA)和其他生物分子(蛋白质、RNA和基因组DNA)的相互作用。NPInter v2.0含有201107个相互作用条目,涉及18个物种。其中包括32个细菌s RNA以及107个细菌s RNAtarget相互作用数据,但没有收录结合位点信息。BSRD是由Huang等人2013年开发的一个综合性的细菌s RNA数据库,它系统收集了细菌s RNA信息并整合了大量的注释信息。BSRD通过整合其他数据库信息和手工文献收集的方法共收集了897个实验证实的细菌s RNA、8248个s RNA同系物以及高通量测序数据预测得到的507个候选s RNA。在s RNA靶标方面,主要整合了s RNA预测靶标和s RNATar Base数据库提供的靶标信息。s RNATar Base是我们课题组2010年开发的一个实验证实的细菌s RNA靶标数据库。该数据库共收录数据392例,涉及17个细菌基因组,包含68个s RNA和227个靶标(或非靶标),特别是还包含了s RNA-m RNA相互作用结合位点信息。通过上述分析,可以看到,除了s RNATar Base,目前并没有数据库提供完整的细菌s RNA靶标信息,特别是没有s RNA-m RNA相互作用位点信息,不利于s RNA靶标m RNA预测模型的开发。其次,s RNATar Base数据库久未更新。为此,本课题拟在s RNATar Base的基础上构建全新的细菌s RNA靶标数据库,并在数据库的基础上开展功能注释研究。为构建一个数据全面、功能丰富的细菌s RNA靶标数据库,本研究采用三个策略进行数据收集工作:(1)根据NCBI基因组最新注释信息以及s RNATar Base2.0各条目的对应文献,对2.0版的392条数据进行全面校验和系统更新,如s RNA和靶标的NCBI识别编号链接、基因组位置、序列、各种位点坐标等信息。(2)采用不同关键词组合,例如bacterial s RNA target、bacterial small regulatory RNA target等,在Pub Med数据库中搜索细菌s RNA靶标相关文献,共得到在2010年1月1日-2015年6月1日之间发表的3124篇文献。根据摘要,从中筛选出120篇包含细菌s RNA靶标数据的文献,然后详细阅读这些文献并提取需要的s RNA-靶标信息和实验证据。(3)为防止靶标数据的遗漏,从所有细菌s RNA靶标预测工具的文献中提取s RNA-靶标数据集,并与数据库中的数据进行比对。最后,截至2015年6月1日,数据库共包含来自53个基因组的771个s RNA-靶标数据,其中有492个经实验证实细菌s RNA-靶标数据和279个无相互作用数据。数据库中包含752条s RNA-m RNA记录,和19条s RNA-蛋白质记录。此外,我们搭建了全新的数据库网站服务器,为用户提供更好的服务。数据库网站(http://ccb1.bmi.ac.cn/srnatarbase/)主要包括6个主要功能。(1)通过常见信息(s RNA信息、靶标信息、s RNA-靶标相互作用信息和实验证据)、序列(Blast功能)以及文献对数据库进行检索,同时还支持多条件组合查询。(2)RNA二级结构动态展示。(3)细菌s RNA-靶标相互作用的NCBI序列展示。(4)细菌s RNA-靶标调控网络展示。(5)基于s RNATarget和s Tar Picker靶标预测,并对得到的预测靶标进行功能富集分析。网站提供DAVID、GOEAST和PANTHER三个注释平台供用户选择。(6)进化分析(Phylogenetic analysis),用来检测s RNA-靶标相互作用在相近基因组中的保守性。在数据库中我们发现一些s RNA拥有多个靶标,一些靶标被多个s RNA调控。为了研究一个s RNA与一组靶标或一个靶标与一组s RNA之间的关系,我们开发了在线服务器Cos Tar,一个用于分析细菌s RNA靶标协同调控作用的分析工具。对于实验中产生的s RNA(或者基因)集合,例如在不同条件下差异表达基因集合,Cos Tar可以预测可能和它们相互作用的基因(或者s RNA)列表,从而对进一步的实验提供指导。我们从BSRD数据库中得到897个s RNA序列,从NCBI数据库中下载最新的细菌基因组序列。然后选取s RNATarge和s Tar Picker两种预测工具对选取的s RNA进行批量预测,将得到的结果按照统一的格式存入预测靶标数据库中。输入为一组s RNA时,我们采用统计学中的超几何分布来计算每一个m RNA的P值,依据P值对所有靶标进行排序。其中P值小于给定阈值的m RNA可以作为这一组s RNA的预测靶标。为方便相关研究人员的使用,我们还构建了在线分析服务器Cos Tar。综上所述,本文以细菌s RNA为中心,开展了两部分的工作:(1)我们成功地构建了细菌s RNA靶标数据库3.0。数据库共包含来自213篇文献的771条记录,其中实验证实的细菌s RNA-靶标数据有492个,结合位点有316个。与其他细菌s RNA数据库(Regulon DB、BSRD、s RNAMap和NPInter等)相比,s RNATar Base3.0不仅提供了最新最全的细菌s RNA靶标数据,同时还包含了316个结合位点数据以及实验中的突变信息。此外,全新的数据库网站提供了NCBI序列展示、s RNA调控网络、预测靶标及其GO注释和进化分析等各项功能,使得s RNATar Base3.0成为一个功能丰富的细菌s RNA靶标数据库。(2)我们成功构建了一个用于预测细菌s RNA-靶标协同调控作用的在线服务器Cos Tar。Cos Tar提供s RNA-Gene和Gene-s RNA两个功能,不仅能预测一组s RNA协同调控的靶标m RNA,还可以预测调控一组靶标m RNA的s RNA。该工作的主要特色与创新点有三个方面:(1)构建的细菌s RNA靶标数据库3.0拥有最为全面的细菌s RNA靶标数据,可以为相关研究(例如开发细菌s RNA靶标预测模型等)提供全面、准确的数据。(2)构建的数据库网站提供NCBI基因组展示、s RNA调控网络和GO分析等多种工具,可以从各个角度解读s RNA靶标数据,能够为相关研究人员提供帮助。(3)构建的Cos Tar在线分析工具是首次从协同调控角度分析细菌s RNA-靶标数据的工具,可以为相关人员提供帮助。
[Abstract]:Bacterial s RNA is an important regulator of RNA, associated with a variety of biological processes such as The new supersedes the old. (quorum sensing), quorum sensing, biofilm formation, iron regulation and virulence regulation. They mainly through the function and target of M RNA or protein binding system therefore, collect experiments confirmed that bacterial s RNA target, and the development of the corresponding database management and analysis system, not only can provide help for the function and mechanism of the in-depth understanding of s RNA, also can forecast model to support the development of s RNA. The target bacteria associated with bacterial s RNA database to s RNAMap, s RNAdb, Rfam Regulon, DB, NPInter, BSRD and s RNATar Base. The database in the data collection and data annotation respectively. For example, s RNAMap is a gram-negative bacterium s RNA database, including 397 s RNA from 70 microbial genomes, 62 s transcription factor RNA And 60 s of RNA target. In addition the database also provides a forecast for the two level structure of s RNA, s RNA and s RNA expression of the expression level of.S RNAdb information is a collection of gram positive bacteria s RNA database platform, the database includes 558 gram positive bacterial genomes and plasmids, 671 experiments confirmed bacterial s RNA and 9993 s RNA and prediction of bacteria, can s RNA user input data analysis, find the homologous s RNA.Rfam database is mainly collected from various eukaryotic and prokaryotic NC RNA family, and provides two levels of structure information, the bacterium s RNA, the main collection s RNA sequence information, does not involve the S RNA target. Regulon DB is a database of Escherichia coli K- 12 transcriptional regulatory network database, including transcription units (transcription, units, TUs) promoter and transcription factor (trans Criptional regulators, TRs) and other information. The database is a collection of 110 s RNA and 227 s RNA-target interaction, which contains 53 target m RNA RNA confirmed.NPInter loci encoding with non main collection experiment (t RNA and R RNA excluded) and other biological molecules (proteins, RNA and genomic DNA) each other.NPInter v2.0 contains 201107 interactions involving 18 items, including 32 bacterial species. S RNA and s RNAtarget 107 bacterial interaction data, but without binding site information.BSRD is a comprehensive Huang et al in 2013 the development of bacterial s RNA database system, it collects the information of bacterial s RNA and the integration of a large number of.BSRD through the method of annotation information integration of database information and manual collection of literature collected a total of 897 experiments confirmed that bacteria s RNA, 8248 s RNA homologues and high flux measurement The predicted 507 candidate s RNA. in s RNA on the target sequence data, mainly the integration of s RNA and s RNATar Base forecast target database provides information of.S RNATar Base is the target we confirmed a subject of experimental group in 2010 the development of bacterial s RNA target database. This database collected data of 392 cases, involving 17 bacterial genome contains 68 s RNA and 227 target (or target), especially s RNA-m RNA also contains the binding site of the interaction information. Through the above analysis, we can see that in addition to s RNATar Base, there is no database to provide complete information of the target bacteria s RNA s RNA-m RNA, especially not mutually site information, is not conducive to the development of predictive model of s RNA m RNA target. Secondly, the s RNATar Base database for a long time not updated. Therefore, this paper intends to build a new bacterial s RNA target data based on s RNATar Base. Library, and carry out functional annotation research based on the database. Data for the construction of a comprehensive, feature rich bacterial s RNA target database, this study adopts three strategies for data collection work: (1) according to the latest NCBI genome annotation information and s RNATar Base2.0 to the corresponding literature, a comprehensive update check the system of 392 and 2 version of the data, such as NCBI RNA and s link identification number, target sequence, genomic location, site coordinates and other information. (2) using different combination of keywords, such as bacterial s RNA target, bacterial small regulatory RNA target s RNA, the search target bacteria related literature in Pub Med database in a total of 3124 articles published between January 1, 2010 -2015 June 1st. According to the summary, screened from 120 included bacterial s RNA target data of the literature, then read the text in detail Offer and extract needed s RNA- target information and experimental evidence. (3) to prevent the target missing data extraction, s RNA- target data set from all the bacteria s RNA target prediction tools in the literature, and compared with the data in the database. Finally, as of June 1, 2015, the database contains a total of 771 s RNA- target data from the 53 genomes, including 492 experiments of bacterial s RNA- target data and 279 non interaction data. The database contains 752 records of RNA s RNA-m, and 19 s RNA- protein records. In addition, we build a new database web server, to provide users with better service. The database website (http://ccb1.bmi.ac.cn/srnatarbase/) mainly includes 6 main functions. (1) through the common information (s RNA, s RNA- target target information, interaction information and experimental evidence), the sequence (Blast) and the Offer the retrieval of the database, and also supports multi condition combination query. (2) RNA two level structure dynamic display. (3) NCBI sequences show bacterial s RNA- target interactions. (4) bacterial s RNA- target regulatory network display. (5) s RNATarget and s Tar Picker target prediction based on and on the prediction of target enrichment analysis. The site provides DAVID, GOEAST and PANTHER three annotation platform for users to choose. (6) (Phylogenetic analysis), phylogenetic analysis to detect s RNA- in the genome of the target are similar to conservative. In the database we found some s RNA with multiple targets, some target by multiple s RNA regulation. In order to study the relationship between a s RNA and a group of target or a target and a group of s RNA, we developed an online server Cos Tar, an analysis tool for analysis of synergistic regulation of bacterial s RNA on the target. S RNA in the set (or genes), such as the differences in gene expression under different conditions set, Cos Tar can predict genes and their mutual action (or s RNA) list, so as to provide guidance for further experiment. We are from the BSRD database to the 897 s RNA sequence, Download bacteria the latest genome sequence from NCBI database. Then select s RNATarge and s Tar Picker two prediction tools to predict s RNA batch selection, the results will be in accordance with the unified format stored in the database. The input predicted targets for a group of s RNA, we used hypergeometric distribution statistics to calculate each a m RNA P, according to the P value to sort all targets. The P value of M RNA is less than a given threshold can be used as the predicted targets of a group of s RNA. For the convenience of research staff, we also constructed in Line analysis server Cos Tar. based on s RNA in bacteria as the center, to carry out the work of two parts: (1) we have successfully constructed the bacterial s RNA target database 3.0. database contains 771 records from 213 articles, of which s RNA- bacteria target experiments confirmed the calibration data of 492, binding site 316. And other bacteria s RNA database (Regulon DB, BSRD s, RNAMap and NPInter) compared to s RNATar Base3.0 not only provides a bacterial s RNA target data of the latest, but also includes the combination of 316 mutation site data and information in the experiment. In addition, the new database website the NCBI sequence of s RNA display, regulatory network, the function of GO and its predicted target annotation and phylogenetic analysis, the s RNATar Base3.0 has become a rich function of bacterial s RNA target database. (2) we have successfully constructed for a pre Detection of bacterial s RNA- target synergistic regulation Cos Tar.Cos Tar online server provides s RNA-Gene and Gene-s RNA two, m RNA can not only predict the target of a group of s RNA cooperative regulation, the main characteristics and innovations can also predict a group of M RNA control target s RNA. the work has three aspects: (1) construction of the bacterial s RNA target database 3 has the most bacteria s RNA target data comprehensively, can for related research (such as the development of bacterial s RNA target prediction model) to provide comprehensive and accurate data. (2) the construction of database website NCBI genome display, a variety of tools s RNA control network and GO analysis so, s RNA target data can be interpreted from various angles, to provide help for the related researchers. (3) Cos Tar online analysis tool construction is the first analysis of bacterial s RNA- target data from the collaborative tools for the phase angle control, The personnel are provided with help.
【学位授予单位】:中国人民解放军军事医学科学院
【学位级别】:博士
【学位授予年份】:2016
【分类号】:Q78
【相似文献】
相关期刊论文 前3条
1 刘倩;应晓敏;吴佳td;查磊;李伍举;;基于转录终点序列特征预测大肠杆菌sRNA[J];生物物理学报;2011年03期
2 赵小凯;竹俊兰;严浩;王慧利;;细菌sRNA功能、预测及鉴定方法的研究进展[J];温州医学院学报;2012年05期
3 ;[J];;年期
相关博士学位论文 前2条
1 王江;细菌sRNA靶标数据库3.0构建及其功能注释研究[D];中国人民解放军军事医学科学院;2016年
2 徐杰;布鲁氏菌转录组测序分析及sRNA功能研究[D];吉林大学;2013年
相关硕士学位论文 前1条
1 汪屹;大肠杆菌sRNA编码基因yigP结构与功能研究[D];华东理工大学;2012年
,本文编号:1431550
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/1431550.html
教材专著