当前位置:主页 > 科技论文 > 自动化论文 >

基于集成学习与柔性神经树的蛋白质翻译后修饰位点预测

发布时间:2018-10-22 17:20
【摘要】:蛋白质翻译后修饰在细胞生命过程中起到至关重要的作用,多种蛋白质翻译后修饰相互影响、相互协调,共同维持、促进各种细胞活动的正常进行。然而,翻译后修饰的鉴定在生物学上往往是繁复的实验工作,效率较低。因此,开发有效的生物信息学预测工具来提高修饰位点鉴定工作的效率势在必行。本文以蛋白质序列为基本研究对象,结合多种特征提取方法,通过计算的方法,对蛋白质翻译后磷酸化修饰和磷酸甘油酯化修饰的修饰位点进行了预测研究。针对磷酸化修饰,本文从其修饰的功能出发,从磷酸化修饰数据库中抽取了多条与信号传导功能相关的蛋白质序列,构建了数据集。在特征提取上,提出了一种新的提取方法,将氨基酸残基理化性质的分组信息融入到以氨基酸残基在滑窗中出现频率为基础的特征提取中。通过实验发现,在融合氨基酸残基理化性质分组信息后,同种修饰位点在相同的预测模型下,预测结果有了很大的提升。在本文中,利用基于粒子群算法优化的神经网络模型的预测准确率从58%左右提升到86%。本文在此基础上还围绕氨基酸残基序列的大小对实验结果的影响进行了初步实验,结果发现当蛋白质微序列包含23个氨基酸残基时,预测结果达到最优值。之后,本文将数据集按照十折交叉验证的方法进行整理,利用神经网络、支持向量机和柔性神经树三种模型集成学习的方法,按照新的特征提取方法对数据集进行实验。其中三种模型的组合策略按照少数服从多数原则进行投票。实验结果显示,三种预测模型进行集成学习后,预测准确率可以达到87.50%,较以前研究结果有了很大提升。针对磷酸甘油酯化修饰,本文利用柔性神经树模型对这种修饰展开预测修饰位点的研究工作,并将实验结果与本领域最新研究进展进行了比较。其中,数据集通过十折交叉验证的方式进行处理,并且蛋白质微序列的窗口值采用了以往研究人员的结论。实验结果显示,柔性神经树在等量的正负样本下,具有较大的优势,其预测准确率能达到90%以上,远高于先前研究人员发表的实验结果。柔性神经树预测结果中马修相关系数最高达到0.807,随着负样本比例的增大,虽然预测结果的准确率得到提高,但马修相关系数逐渐降低。当数据集包含全部样本时,预测结果的马修相关系数为0.326,降低幅度较大,可见正负样本数据不平衡对实验的结果影响较大。综上所述,本文在新的特征提取方法上,利用多种预测模型集成学习进行了蛋白质磷酸化修饰位点的预测工作,且集成后的模型表现良好。同时本文应用柔性神经树模型进行了磷酸甘油酯化修饰位点预测的研究,与最新的研究结果相比,该模型较大幅度的提升了预测性能。
[Abstract]:Posttranslational modification of proteins plays an important role in the process of cell life. Many kinds of post-translational modification of proteins interact with each other, coordinate with each other, maintain together, and promote the normal development of various cell activities. However, the identification of post-translational modification is often a complicated experiment in biology, and its efficiency is low. Therefore, it is imperative to develop effective bioinformatics prediction tools to improve the efficiency of the identification of modified sites. In this paper, the protein sequence is taken as the basic research object, combining with many methods of feature extraction, the modified sites of post-translational phosphorylation and glycerol phosphate modification of proteins are predicted by means of calculation. According to the function of phosphorylation modification, several protein sequences related to signal transduction function were extracted from the phosphorylation modification database, and the data set was constructed. In feature extraction, a new extraction method is proposed, in which the grouping information of the physical and chemical properties of amino acid residues is incorporated into the feature extraction based on the frequency of amino acid residues appearing in the sliding window. It was found by experiments that the homologous modified sites improved greatly under the same prediction model after the fusion of amino acid residues' physical and chemical properties. In this paper, the prediction accuracy of neural network model based on particle swarm optimization is improved from about 58% to 86%. On this basis, the influence of the size of amino acid residues on the experimental results is also studied. The results show that when the protein microsequences contain 23 amino acid residues, the predicted results reach the optimal value. After that, the data set is sorted out according to the method of ten fold cross validation, and the data set is tested according to the new feature extraction method using three integrated learning methods: neural network, support vector machine and flexible neural tree. The combination strategies of three models are voted according to the majority principle. The experimental results show that the prediction accuracy can reach 87.50 after the integration learning of the three prediction models, which is greatly improved compared with the previous research results. In this paper, a flexible neural tree model was used to predict the modification sites of glycerol phosphate, and the experimental results were compared with the latest research progress in this field. The data sets are processed by 10% cross-validation, and the window values of protein microsequences are based on previous researchers' conclusions. The experimental results show that the flexible neural tree has a great advantage in the same number of positive and negative samples, and its prediction accuracy can reach more than 90%, which is much higher than the experimental results published by previous researchers. The Mathieu correlation coefficient is the highest 0.807 in the prediction results of the flexible neural tree. With the increase of the negative sample ratio, the accuracy of the prediction results is improved, but the Mathieu correlation coefficient decreases gradually. When the data set contains all the samples, the Mathieu correlation coefficient of the predicted results is 0.326, which decreases greatly. It can be seen that the imbalance of the positive and negative sample data has a great influence on the experimental results. In conclusion, in the new feature extraction method, we use a variety of predictive model ensemble learning to predict protein phosphorylation modified sites, and the integrated model performs well. At the same time, the prediction of the modified sites of glycerol phosphate was studied by using the flexible neural tree model. Compared with the latest research results, the prediction performance of the model was greatly improved.
【学位授予单位】:济南大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q51;TP18

【参考文献】

相关期刊论文 前10条

1 李伟哲;王洪岩;杜海宁;;非组蛋白甲基化修饰的研究进展[J];生物化学与生物物理进展;2015年11期

2 吕斌娜;梁文星;;蛋白质乙酰化修饰研究进展[J];生物技术通报;2015年04期

3 邱望仁;邹国英;查娟娟;霍立田;;蛋白质翻译后修饰研究概述[J];景德镇学院学报;2014年06期

4 王伟;何华勤;;基于LibSVM的CKSAAP蛋白特征提取预测水稻蛋白质磷酸化位点[J];湖北科技学院学报;2014年07期

5 阮班军;代鹏;王伟;孙建斌;张文涛;颜真;杨静华;;蛋白质翻译后修饰研究进展[J];中国细胞生物学学报;2014年07期

6 黄淑云;;决策树算法预测人类病毒的蛋白质磷酸化位点[J];萍乡高等专科学校学报;2013年03期

7 梁前进;王鹏程;白燕荣;;蛋白质磷酸化修饰研究进展[J];科技导报;2012年31期

8 黄秀;陈月辉;曹毅;;基于柔性神经树的蛋白质结构预测[J];计算机工程;2011年01期

9 姜铮;王芳;何湘;刘大伟;陈宣男;赵红庆;黄留玉;袁静;;蛋白质磷酸化修饰的研究进展[J];生物技术通讯;2009年02期

10 阮晓钢,孙海军;编码方式对蛋白质二级结构预测精度的影响[J];北京工业大学学报;2005年03期

相关博士学位论文 前2条

1 李晶;综合蛋白质翻译后修饰数据库的构建和分析[D];华中科技大学;2014年

2 涂娟娟;PSO优化神经网络算法的研究及其应用[D];江苏大学;2013年

相关硕士学位论文 前4条

1 陈祥;蛋白质翻译后修饰位点识别研究和应用[D];南昌大学;2014年

2 索生宝;蛋白质翻译后修饰位点预测及其功能分析[D];南昌大学;2013年

3 蒋君宝;基于序列多信息融合的蛋白质亚细胞定位预测方法研究[D];湖南大学;2011年

4 刘钦锋;蛋白质序列编码与功能预测[D];湖南大学;2011年



本文编号:2287808

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2287808.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户496f8***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com