基于序列信息对十种金属离子结合残基的识别

发布时间：2018-05-01 17:39

本文选题：金属离子配体 + 结合残基　；参考：《内蒙古工业大学》2017年硕士论文

【摘要】：众所周知,金属离子在生命进程中扮演重要角色。超过三分之一的蛋白质结构中包含有金属离子,它们参与了酶催化,维持蛋白质结构,以及扮演监管角色。这些功能均通过蛋白质与金属离子配体的相互作用来实现。因此识别蛋白质中的金属离子结合残基极为重要,这对分子药物的设计具有指导价值。基于序列信息识别蛋白质中金属离子结合残基,主要工作如下:(1)十种金属离子Zn2+,Cu2+,Fe2+,Fe3+,Ca2+,Mg2+,Mn2+,Na+,K+和Co2+结合残基数据集均来自BioLiP数据库,筛选分辨率好于3?,序列长度大于50个残基和序列同源性低于30%的蛋白质链。使用滑动窗口方法得到了十种金属离子的最佳窗口分别为7,13,9,9,9,9,7,9,11,11。(2)对十种金属离子结合残基数据集进行统计分析,发现位点氨基酸具有较强的保守性,以位点氨基酸为特征参数用位置权重矩阵打分算法对十种金属离子结合残基进行识别,五交叉检验下的总精度高于62.7%,马氏相关系数高于0.335。结果显示Zn2+,Cu2+,Fe2+,Fe3+和Co2+配体结合残基的结果较好,而Ca2+,Mg2+,Mn2+,Na+和K+配体结合残基的结果并不理想。(3)为了进一步提高识别结果,通过研究结合残基的生物背景,添加了氨基酸组分、亲疏水、极化电荷、预测的二级结构和溶剂可及性面积信息作为特征参数。为了避免支持向量机算法的过训练,使用矩阵打分和离散增量算法将一些特征降维优化,并输入到支持向量机对十种金属离子结合残基进行识别。五交叉检验下得到了较好的识别结果,预测的总精度和马氏相关系数均高于74.8%和0.502。并通过组合特征参数的支持向量机计算,分析了各种金属离子配体结合残基对特征参数的敏感性。为了增强模型的实用性,对金属离子数据集进行了独立检验,并与前人研究的结果进行了对比,发现预测趋势与前人的结果一致。(4)引进了随机森林算法。把与SVM算法相同的组合特征参数输入随机森林算法中,五交叉检验下的识别结果略低于SVM算法下的识别结果。将未优化降维的特征参数输入随机森林算法中,五交叉检验下的识别结果比SVM算法下的识别结果稍好,尤其是对Ca2+,Mg2+,Mn2+配体。(5)搭建了预测平台,建立了一个金属离子配体结合残基的在线预测服务网站,免费对外开放,为相关研究提供便利和帮助。
[Abstract]:It is well known that metal ions play an important role in the life process. More than 1/3 of protein structures contain metal ions that are involved in enzyme catalysis, maintain protein structures, and play regulatory roles. These functions are achieved by the interaction between protein and metal ion ligands. Therefore, it is very important to recognize the metal ion binding residues in proteins, which is valuable for the design of molecular drugs. Based on the sequence information, the main work of identifying metal ion binding residues in proteins is as follows: (1) Ten metal ions, Zn2, Cu2, Fe2, Fe2, Fe3, Ca2 +, Mg2, Mn-2, Na-K, and Co2 binding residues, are all derived from BioLiP databases. The screening resolution was better than 3, the length of the sequence was more than 50 and the sequence homology was less than 30%. The best window of ten metal ions was obtained by sliding window method. The data sets of ten metal ion binding residues were statistically analyzed. It was found that the amino acids of the ten metal ions were highly conserved. Ten metal ion binding residues were identified by the position weight matrix scoring algorithm with amino acid as the characteristic parameter. The total accuracy of the five cross tests was higher than 62.7 and the Markov correlation coefficient was higher than 0.335. The results showed that the binding residues of Zn2 Cu2Fe2Fe3 and Co2 ligands were better, but the binding residues of Ca2 Mg2m2Mn2Mn2N2N2N2Fe3 with K ligand were not ideal.) in order to further improve the recognition results, the biological background of the binding residues was studied. Amino acid composition, hydrophilicity, polarization charge, predicted secondary structure and solvent accessibility area information were added as characteristic parameters. In order to avoid the overtraining of support vector machine (SVM) algorithm, matrix scoring and discrete increment algorithm are used to optimize the dimensionality reduction of some features, and the support vector machine is inputted to recognize ten metal ion binding residues. The total accuracy and Markov correlation coefficient of the prediction are higher than 74.8% and 0.502% respectively. The sensitivity of various metal ion ligand binding residues to the characteristic parameters was analyzed by using support vector machine (SVM). In order to enhance the practicability of the model, the metal ion data set is independently tested and compared with the results of previous studies. It is found that the prediction trend is consistent with the previous results. (4) A stochastic forest algorithm is introduced. The combined characteristic parameters which are the same as the SVM algorithm are input into the stochastic forest algorithm. The recognition results under the five cross test are slightly lower than those under the SVM algorithm. When the unoptimized dimensionality reduction characteristic parameters are input into the stochastic forest algorithm, the recognition results under the five-cross test are better than those under the SVM algorithm, especially the prediction platform for the Ca2 mg _ 2 / mn _ 2 ligands. An online prediction service for metal ion ligand binding residues is established, which is open to the public free of charge and provides convenience and assistance for related research.
【学位授予单位】：内蒙古工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP18;O641

【相似文献】