人类泛素连接酶—底物相互作用的生物信息学研究
发布时间:2018-04-11 04:15
本文选题:生物信息学 + 泛素连接酶 ; 参考:《中国人民解放军军事医学科学院》2017年博士论文
【摘要】:泛素是一种76个氨基酸构成的小分子蛋白,能够通过C端的羧基与赖氨酸残基上的氨基之间形成肽键而共价结合到特定的蛋白上,该过程被称作泛素化修饰。泛素化修饰调控了真核生物细胞内80%蛋白质的蛋白酶体降解,同时还调控炎症信号通路、DNA损伤修复等多种进程。泛素化修饰与阿尔兹海默综合症、帕金森综合症以及多种癌症的发生发展密切相关。研究泛素化对理解遗传信息的调控表达和多种疾病的发生发展有重要意义。在泛素化修饰过程中,泛素连接酶E3与底物之间的相互作用决定了底物蛋白的特异性,是理解泛素化系统精确调控的关键。人们发展了多种利用高通量蛋白检测技术(例如全局蛋白稳定性分析、蛋白微阵列、噬菌体展示文库和质谱技术)鉴定泛素连接酶-底物相互作用的方法。然而由于泛素连接酶底物较低的蛋白表达水平、泛素连接酶与底物较弱的相互作用,高通量实验方法往往效率较低,且成本较高。因此,尽管目前有近5700个底物蛋白的30000个泛素化修饰位点被鉴定,但是数据库中仅有861对泛素连接酶-底物相互作用关系,这意味着只有约15%的泛素化修饰蛋白有泛素连接酶信息。因此迫切需要发展一种有效的在蛋白质组水平预测泛素连接酶-底物相互作用的生物信息学方法。为解决这一挑战,本文构建了人类泛素连接酶-底物相互作用的预测模型。首先,为了构建预测模型所需的金标准数据集,本文发展了一套从文献中获取泛素连接酶-底物相互作用信息的策略。从PubMed和Web of Knowledge数据库获取可能包含泛素连接酶-底物相互作用信息的文献摘要,利用文本挖掘工具对文献摘要进行分析,经过人工判读进行校验,最终本文构建了一个包含1315对泛素连接酶底物相互作用的数据集,这是目前最大的泛素连接酶-底物相互作用数据集。利用构建的数据集,本文进一步构建了泛素连接酶-底物相互作用网络,该网络具有无尺度性质。依据来源文献的发表时间,本文将泛素连接酶-底物相互作用数据集划分为了金标准阳性数据集(2010年1月1日之前)和独立测试阳性数据集(2010年1月1日之后)。在金标准阴性数据集方面,由于难以找到一个实验验证的理想金标准阴性数据集,本文从与E3有相互作用的蛋白质中随机抽取了不被金标准阳性数据集和独立测试阳性数据集包含的泛素连接酶-蛋白对作为金标准阴性数据集。然后,本文构建了可用于泛素连接酶-底物预测的五大类生物学特征,包括:同源泛素连接酶-底物相互作用、泛素连接酶-底物富集的结构域对、泛素连接酶-底物富集的GO功能条目对、蛋白相互作用网络环和潜在的泛素连接酶识别底物序列motif。本文发现泛素连接酶可能通过特定的结构域和底物相结合,或者识别底物上特定的序列motif。泛素连接酶和底物在相互作用网络中更倾向于形成3元环和4元环。本文利用似然比评估发现这五大类特征都能对泛素连接酶-底物相互作用关系进行有效预测,并且该体系有助于发现潜在的泛素连接酶识别底物结构域和motif,例如本文预测的“TP53 DNA-binding domain”能与泛素连接酶WWP1相互作用(富集比:7.21),以及APC/C复合体泛素连接酶识别底物的“KEN” motif (motif得分:16.13),都得了文献的验证。进而,本文利用朴素贝叶斯分类器,整合这五大类特征构建了 ESI的预测模型。五倍交叉验证发现整合后的模型ROC曲线下面积高于任何单一特征的预测模型,其面积为0.827,说明该预测模型具有理想的预测效果,同时,独立测试数据集(所有ESI的发现时间均在2010年1月1日之后)测试下ROC曲线下面积为0. 733,说明模型具有发现新的ESI的能力。最后,本文利用构建的ESI预测模型对人类蛋白质组范围的ESI进行了预测并构建了在线的泛素连接酶-底物相互作用展示平台UbiBrowser(http://ubibrowser.ncpsb. org)。UbiBrowser 支持多种数据提交方式,利用网络视图、列表视图和序列视图,展示了预测的泛素连接酶-底物相互作用、文献来源的泛素连接酶-底物相互作用以及蛋白的泛素化修饰位点和可能的被识别结构域和motif信息。模型预测相关的支持证据信息也同时向用户进行展示。本文利用UbiBrowser对一些与疾病相关的泛素连接酶-底物相互作用进行了预测,这些预测结果得到了最新发表文献的支持。本文进而选取了潜在的泛素连接酶底物-相互作用对Smurfl-Smad3进行实验研究,结果表明在过表达体系下,Smurfl能够介导Smad3的泛素化修饰,这进一步说明UbiBrowser能够帮助实验人员发现新的泛素连接酶-底物相互作用。总之,为高效的揭示泛素连接酶-底物相互作用关系,本文完成了从泛素连接酶-底物相互作用的数据收集、预测模型构建以及在线浏览平台开发等一系列工作,最终为用户提供了首个覆盖人类所有蛋白的泛素连接酶-底物相互作用浏览器。本文工作,有助于研究人员发现新的泛素连接酶-底物相互作用,同时也有助于更加深入的理解泛素化修饰过程中泛素连接酶-底物间的选择机制。
[Abstract]:Ubiquitin is a small protein of 76 amino acids, and peptide bond covalent binding to specific proteins can form through C terminal carboxyl and lysine residues on the amino group, a process called ubiquitination. Ubiquitination regulates proteasome degradation in eukaryotic cells, 80% egg white quality, but also regulate the inflammatory signaling pathway, DNA damage repair and other processes. The ubiquitination and Alzheimer's syndrome, Parkinson syndrome is closely related to the occurrence and development of many cancers. It has important significance to study the regulation of ubiquitin expression and understanding the genetic information of many diseases. In the ubiquitination process. The interaction between ubiquitin ligase E3 and substrate determines the specific substrate protein, is the key to understand the precise regulation of the ubiquitin system. The development of a variety of people to use high-throughput protein detection Technology (such as global protein stability analysis, protein microarray, phage display library and mass spectrometry) method for identification of ubiquitin ligase substrate interactions. However, due to the lower substrate of the ubiquitin ligase protein interaction with ubiquitin ligase substrate is weak, high-throughput experimental methods are of low efficiency, and high cost. Therefore, although there are 30000 nearly 5700 ubiquitin protein modification sites were identified, but only 861 of the database of ubiquitin ligase substrate interactions, which means that only about 15% of the ubiquitination protein ubiquitin ligase. Therefore there is an urgent need to develop an effective in proteomics bioinformatics prediction of ubiquitin ligase substrate interaction method. To solve this challenge, this paper constructs a prediction of human ubiquitin ligase substrate interactions Model. First, the gold standard in order to build prediction model for data set, this paper developed a ubiquitin ligase - from the literature to obtain substrate interaction information acquisition strategy. May contain ubiquitin ligase substrate interaction information from PubMed and Web of of the Knowledge database, using text mining tools for analysis of literature Abstract after artificial interpretation verification, finally this paper constructs a contains 1315 pairs of ubiquitin ligase substrate interaction data set, which is currently the largest ubiquitin ligase substrate interaction data set. Using the data sets, the paper builds the ubiquitin ligase substrate interaction network, the network has scale-free properties according to published sources. The literature, the ubiquitin ligase substrate interaction data set into positive data set (2010 gold standard The year before January 1st) and the independent test positive data set (after January 1, 2010). The gold standard negative data set, because the ideal gold standard negative data is difficult to find a set of experiments, were randomly selected from the E3 interacting proteins are not gold standard positive data sets and independent test positive data set the ubiquitin ligase protein on the gold standard for negative data set. Then, this paper constructs can be used to predict the ubiquitin ligase substrate five categories of biological characteristics, including: homologous ubiquitin ligase substrate interaction domain, ubiquitin ligase substrate concentration of GO, the function of entry of ubiquitin ligase substrate concentration protein interaction network, ring and potential ubiquitin ligase substrate recognition sequence of motif. we found that through ubiquitin ligase domain and substrate specific combination, or Identification of substrate specific sequences of motif. ubiquitin ligase and substrate interaction in the network tend to form 3 membered ring and 4 membered ring. The likelihood these five kinds of features can effectively predict the ubiquitin ligase substrate interaction relationship evaluation, and the system can help to find potential ubiquitin ligase substrate recognition domain and motif, such as the "TP53 DNA-binding domain" prediction can interact with ubiquitin ligase WWP1 (enrichment ratio: 7.21), and the APC/C ubiquitin ligase complex substrate recognition of the "KEN motif" (motif score: 16.13), verification documents. Then got, using the Naive Bayesian classifier. The integration of the five kinds of feature prediction model is built. ESI prediction model ROC curve area model of five fold cross validation found after the integration of the next higher than any single feature, its area is 0. 827, the prediction effect, the prediction model is ideal at the same time, the independent test data set (all ESI found time in January 1, 2010 after the area under the ROC curve test) was 0.733, shows that the model is capable of discovering new ESI. Finally, the ESI prediction model based on the human proteome wide ESI the prediction and the construction of the online ubiquitin ligase substrate interaction platform UbiBrowser (http://ubibrowser.ncpsb. ORG.UbiBrowser) to support multiple data submission, using the network view, list view and sequence view, show the prediction of ubiquitin ligase substrate interactions, the literature source of ubiquitin ligase substrate interactions and protein ubiquitination site and may be recognition domain and motif information. At the same time supporting evidence information related to the user model Show. UbiBrowser to predict some disease associated ubiquitin ligase substrate interactions by using these results obtained with the newly published literature support. This paper then select a potential substrate ubiquitin ligase interaction experiment on Smurfl-Smad3, the results show that the over expression system, Smurfl ubiquitination mediated by Smad3, it proves that UbiBrowser can help researchers find new ubiquitin ligase substrate interactions. In short, efficient reveal ubiquitin ligase substrate interaction relationship, this paper completed the ubiquitin ligase substrate interaction data collection, forecasting models and online through a series of work platform development. The end provides the first cover all human ubiquitin ligase substrate interaction browser for users. The work of this paper, It is helpful for researchers to discover new ubiquitin ligase substrate interaction, and help further understand the ubiquitin ligase substrate selection mechanism in ubiquitination.
【学位授予单位】:中国人民解放军军事医学科学院
【学位级别】:博士
【学位授予年份】:2017
【分类号】:R3416
,
本文编号:1734384
本文链接:https://www.wllwen.com/yixuelunwen/jichuyixue/1734384.html