当前位置:主页 > 医学论文 > 西医药论文 >

蛋白质结合位点预测方法研究与应用

发布时间:2018-03-12 14:46

  本文选题:蛋白质结合位点 切入点:氨基酸组成偏好 出处:《大连理工大学》2012年博士论文 论文类型:学位论文


【摘要】:生物分子和很多其它有机配体能够与蛋白质在其表面特定位点高度亲和结合。如何区分这样的结合位点与蛋白质其它表面区域,这个问题是蛋白质研究领域的前沿课题。近些年来,在蛋白质分子表面上预测可能结合区域的潜在价值越来越重要。随着生物学和医学中重要蛋白质的结构知识的不断增长,这样的预测方法变得更加实用化。它能够为合理药物分子设计提供帮助,同时也可以揭示蛋白质分子功能。对于功能预测和合理药物设计两方面的应用,都需要一个可靠的蛋白质.配体结合位点识别和定义方法。在蛋白质复合体三维结构已知的情况下,就可以对蛋白质.蛋白质相互作用界面以及蛋白质.配体结合面做关于氨基酸分布和物理化学特征的系统分析,这使得活性位点的识别成为可能。已经有很多计算方法被开发出来,利用这些信息预测蛋白质可能的结合位点。但是,目前的方法在预测精度和效率上仍然存在不足,所以需要进一步研究结合位点预测方法以提高其预测能力,揭示其关键影响因素。 本文研究蛋白质结合位点的预测方法,主要包括四个部分。 第一章,首先描述了蛋白质-配体相互作用原理,包括热力学理论、结合过程理论模型和物理学性质。然后,概述了蛋白质结合位点预测研究现状,包括蛋白质.配体结合位点预测和蛋白质.蛋白质结合位点预测两个方面内容。最后,简要介绍了本文主要工作内容以及取得的结果。 第二章,提出了两种新的氨基酸组成偏好表示模型,分别以原子和原子接触对作为统计对象,区别于传统使用残基作为统计对象的模型。基于全局口袋偏好的配体结合口袋识别方法测试结果显示,基于原子和基于原子接触对模型要优于基于残基的模型。由于结合位点上存在所谓热点区域,我们定义偏好值最大的局部区域作为一个口袋的热点,这个局部偏好值代表整个口袋的偏好属性,再结合口袋大小属性形成了基于局部口袋偏好的配体结合口袋识别方法。结果分析显示,这两个属性能够相互促进、极大提高识别能力;与文献上发表的一些预测方法比较,我们的方法取得了相当的准确率并具有计算简单的优点。 第三章,基于蛋白质-配体结合位点与蛋白质-蛋白质结合位点在几何特征和物理化学性质方面的差异,我们分别提出了两种残基属性定义模型,即单块和多块残基属性定义模型。由残基属性定义模型得到的残基特征,利用随机森林算法构建了结合残基分类预测器。另外,我们还提出了一种新的聚类方法用来发现并预测结合位点。这些方法分别被应用于蛋白质-配体与蛋白质-蛋白质结合残基的预测。采用相同数据集及成功标准,基于单块残基属性定义模型的随机森林分类器在蛋白质-配体结合位点预测准确率方面要优于Q-SiteFinder, SCREEN和Morita's method三种方法;同样,平衡准确率和CC(Correlation Coefficient)值结果显示,基于多块残基属性定义模型的随机森林分类器在蛋白质.蛋白质结合残基预测能力方面优于Yan、Wang以及Chen and Jeong的方法;在蛋白质-蛋白质结合位点预测方面,基于多块残基属性定义模型的预测器也都优于Bradford and Westhead's method、Bradford and Needham's method和Higa and Tozzi's method。 第四章,把基于随机森林的蛋白质结合位点预测方法用于辅助分子对接。对于蛋白质-配体分子对接,随机森林预测方法以前端使用方式缩小构象搜索空间。对接结果表明,该预测方法在辅助对接方面要优于流行软件Accelrys Discovery Studio中的结合位点预测方法。在蛋白质-蛋白质分子对接中,随机森林预测方法按后端使用方式,即作为一种打分函数用来挑选近自然构象,对接实验表明,基于预测信息设计的打分模型在识别近自然构象方面与ZDOCK打分函数各有优势,有较大的互补性。 论文最后部分对本文的工作做了总结并且对后续研究进行了展望。 本文工作受到国家自然科学基金项目“药物分子优化设计的网格计算方法研究(No.10772042)”,国家863科技计划项目“新药研发网(No.2006AA01A124)”和《国家重点基础研究发展规划》项目“蛋白质动态行为和相互作用模拟新方法研究(No.2009CB918501)”的资助。
[Abstract]:Molecular biology and many other organic ligands can bind with high affinity protein on the surface of a specific location. How to distinguish such binding sites and other protein surface area, this problem is a frontier field of protein research. In recent years, the protein molecules on the surface of prediction may be combined with potential value areas continuously along with the more and more important. The growth structure knowledge of important proteins in biology and medicine, this prediction method becomes more practical. It can provide help for rational drug design, but also can reveal the protein molecular function. The function prediction and rational drug design and application of the two aspects, are in need of a reliable protein ligand binding site. The identification and definition method. In protein complexes with known 3D structure case, can the protein protein interaction field. The surface and proteins. Ligand binding surface analysis system on the distribution of amino acid and physical and chemical characteristics, which makes the identification of active sites as possible. There have been many computational methods were developed to predict protein binding sites may use this information. However, the current method in the prediction accuracy and efficiency are still insufficient, so it is necessary to further study of binding site prediction methods to improve the prediction ability, reveals the key influencing factors.
This paper studies the prediction methods of protein binding sites, including four parts.
The first chapter, first described the protein ligand interaction principle, including thermodynamic theory, combined with the process of theoretical model and physics properties. Then, the protein binding site prediction research, including protein ligand binding site prediction and protein. The protein binding site prediction of two aspects. Finally, this paper briefly introduced the main content of the work and the results obtained.
The second chapter puts forward two kinds of new amino acid preference representation model, respectively, and atoms of contact as the statistical object, different from the traditional use of residues as the statistical object model. The ligand binding pocket pocket global preference test results show that the recognition method based on atomic and contact model is better than the model based on residues based on the atom based. Due to the combination of the existence of the so-called hot spots on the site, we define the preference value of local maximum area as a hot pocket, the local preference attribute value preference on behalf of the pocket, and then formed a pocket recognition method based on local preference based on pocket ligand binding pocket size property. The results of analysis showed that the two attributes can promote each other, greatly improve the recognition ability; compared with some published literature on forecasting methods, our method achieved when quasi phase Accuracy and simplicity of calculation.
The third chapter, differences in protein ligand binding sites and protein protein binding sites in the geometric and physical and chemical properties based on, we propose two residue attribute definition model, single and multi block residue attribute definition model. Residue characteristics derived from residues attribute definition model, using the random forest algorithm is constructed with residue classification predictor. In addition, we also propose a new clustering method to discover and predict binding sites. These methods were applied to predict protein-protein and protein ligand binding residues. Using the same data set and success criteria, single residue attribute definition model the random forest classifier in protein ligand binding site prediction accuracy is better than Q-SiteFinder based on SCREEN, Morita's and method three methods; also, the balance of accuracy and CC (Co Rrelation Coefficient) results show that, based on the multi block residue attribute definition model of random forest classifier in protein. Protein binding residues prediction ability is superior to Yan, Wang and Chen and Jeong method; in the protein-protein binding site prediction, based on multi block residue attribute definition model predictor are better than that of Bradford and Westhead's method, Bradford and Needham's method and Higa and Tozzi's method.
The fourth chapter, the random forest protein binding site prediction method for computer-aided molecular docking based for protein ligand docking, the previous prediction method of random forest end using the way to narrow the search space. The conformation of the docking results show that the prediction method to forecast method is better than the popular software Accelrys Discovery binding sites in the Studio in the auxiliary docking area. In protein-protein docking, random forest forecast methods used in the back-end, as a scoring function to select the near natural conformation, docking experiments show that based on the scoring model of information design prediction in recognition of near natural conformation and ZDOCK scoring functions have their own advantages and are complementary.
The last part of the paper makes a summary of the work of this paper and looks forward to the follow-up research.
This study was supported by the method of molecular drug optimization design of grid computing projects of the National Natural Science Foundation (No.10772042) ", 863 national science and technology project" research and development of new drugs network (No.2006AA01A124) "and" national key basic research and development plan > Project "Research on new method for protein interaction and dynamic behavior simulation (No.2009CB918501)" of China.

【学位授予单位】:大连理工大学
【学位级别】:博士
【学位授予年份】:2012
【分类号】:R341

【参考文献】

相关期刊论文 前2条

1 张光亚;方柏山;;基于氨基酸组成分布的嗜热和嗜冷蛋白随机森林分类模型[J];生物工程学报;2008年02期

2 武晓岩;李康;;随机森林方法在基因表达数据分析中的应用及研究进展[J];中国卫生统计;2009年04期



本文编号:1602028

资料下载
论文发表

本文链接:https://www.wllwen.com/xiyixuelunwen/1602028.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f69d4***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com