蛋白质8态二级结构以及抗癌肽预测的研究
发布时间:2018-01-27 15:38
本文关键词: 蛋白质8态二级结构 二次判别法 化学位移 抗癌肽 预测 出处:《内蒙古农业大学》2017年硕士论文 论文类型:学位论文
【摘要】:蛋白质的生物功能在很大程度上由它的空间结构所决定,所以要了解和掌握蛋白质的功能的前提是要先分析出蛋白质的空间结构。而蛋白质其二级结构识别的研究一般作为蛋白质空间结构预测的一个非常重要步骤。一般来说,蛋白质二级结构的预测工作主要是集中在蛋白质3态二级结构(alpha-螺旋,beta-折叠,无规卷曲)的预测上,但与蛋白质3态二级结构的比较而言,蛋白质8态二级结构能够提供更为细致的结构信息,也因此而更具挑战性,尤其对于那些低同源性的蛋白。本文针对蛋白质8态二级结构提出了一种新的预测模型,即基于多特征组合结合二次判别算法(QDA)进行预测。首先,选取了 200个蛋白,其氨基酸序列的一致性均低于30%,接着在200个蛋白中利用统计学方法提取6个原子的平均化学位移作为特征参量。然后,使用这些化学位移并结合6类亲疏水残基作为特征参量去预测蛋白质8态二级结构。最后,在七折交叉检验下,蛋白质8态二级结构的预测总精度(Q8)达到80.7%。在同一数据集中,对比了其他预测工具,比如:应用C8-Scorpion在线服务器进行预测,还有采用支持向量机算法(SVM)以及随机森林(RF)算法实施预测。结果显示:对于预测精度而言,我们的预测模型是优于其他预测算法的精度。抗癌肽是一种具有明显抗肿瘤活性的抗微生物肽,它们可以在体内快速地消灭有害病菌,同时对人体肿瘤细胞也有很大的抵制作用。如何有效地识别抗癌肽是近十多年生物医学研究的热点问题之一。本文在已发表的公用的抗癌肽数据集中,添加了蛋白质3种二级结构组分(3PSS)作为新的特征参量,并结合20种氨基酸组分(20AAC)和6类亲疏水氨基酸组分(6HP)作为特征信息,采用二次判别法(QDA)实施预测。在7折交叉检验下,当采用蛋白质3种二级结构组分(3PSS)结合6种亲疏水氨基酸组分(6HP)作为特征时,预测总精度(Acc)达到86%;当采用蛋白质3种二级结构组分(3PSS)结合20种氨基酸组分(20AAC)作为特征时,预测总精度达到94%。预测结果显示:氨基酸序列信息作为特征参数时,若添加了二级结构信息后,预测精度都有不同程度的提高。最后,在同种数据集中,与其他预测工作相比较,显示了我们的模型的优越性。
[Abstract]:The biological function of protein is largely determined by its spatial structure. Therefore, the premise of understanding and mastering the function of protein is to analyze the spatial structure of protein first, and the study of protein secondary structure recognition is generally regarded as a very important step in predicting protein spatial structure. Generally speaking. The prediction of protein secondary structure is mainly focused on the prediction of alpha-helix beta- fold, random crimp. However, compared with protein 3 secondary structure, protein 8 secondary structure can provide more detailed structure information, and therefore more challenging. Especially for those proteins with low homology, this paper proposes a new prediction model for protein 8-state secondary structure, which is based on multi-feature combination combined with quadratic discriminant algorithm (QDA). First of all. 200 proteins were selected and the amino acid sequence consistency was less than 30. Then the average chemical shifts of 6 atoms were extracted from 200 proteins by statistical method. These chemical shifts and six hydrophilic residues were used as characteristic parameters to predict the secondary structure of protein 8 states. Finally, under the 70 fold cross test. The total prediction accuracy of the secondary structure of protein 8 was 80.7. In the same data set, the other prediction tools were compared. For example: apply C8-Scorpion online server to predict. The support vector machine (SVM) algorithm and the random forest forest (RFR) algorithm are also used to implement the prediction. The results show that: for the accuracy of the prediction. Our prediction model is superior to other prediction algorithms. Anticancer peptides are antimicrobial peptides with obvious antitumor activity. They can quickly eliminate harmful bacteria in vivo. At the same time, it also has a great resistance to human tumor cells. How to effectively identify anticancer peptides is one of the hot issues in biomedical research in recent ten years. Three secondary structure components of proteins (3PSS) were added as new characteristic parameters, and 20 amino acid components (20 AAC) and 6 kinds of hydrophobic amino acids (6 kinds of hydrophobic amino acids) were used as characteristic information. QDAs were used to predict. Under the 7 fold cross test, the protein 3 secondary structure components (3PSSs) combined with 6 hydrophilic amino acids (6 HPs) were used as the characteristics. The total accuracy of prediction is 86%; When the protein was characterized by three secondary structure components (3PSS) and 20 amino acid fractions (20AAC). The prediction results show that when amino acid sequence information is used as the characteristic parameter, the prediction accuracy is improved to some extent if the secondary structure information is added. Finally, in the same data set. Compared with other prediction work, it shows the superiority of our model.
【学位授予单位】:内蒙古农业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q51;R91
【参考文献】
相关期刊论文 前10条
1 滕志霞;郭茂祖;;蛋白质功能预测方法研究进展[J];智能计算机与应用;2016年04期
2 李U嗘,
本文编号:1468808
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/1468808.html