当前位置:主页 > 科技论文 > 自动化论文 >

基于机器学习的凋亡蛋白亚细胞定位预测研究

发布时间:2018-07-01 11:15

  本文选题:凋亡蛋白 + 亚细胞定位 ; 参考:《郑州轻工业学院》2017年硕士论文


【摘要】:细胞凋亡,或被称为程序性细胞死亡,是细胞生命的最后阶段,是许多生物过程的重要组成部分,保持生物组织平衡中起着重要作用。凋亡蛋白在程序性细胞死亡机制中扮演关键角色,获取细胞凋亡蛋白的亚细胞位置的信息有助于我们了解细胞凋亡机制。随着蛋白质呈现指数级增长,通过生物实验注释的方法不能满足研究者的需求,越来越多的研究人员选择基于机器学习的方法进行蛋白质亚细胞定位预测。本文主要使用机器学习的方法对凋亡蛋白亚细胞定位预测展开深入的研究,完成以下三方面工作:(1)针对基于序列信息的特征提取方法已不能进一步提升预测效果的问题,本文使用蛋白质及其同源蛋白的GO注释信息代替序列信息来表示蛋白质。实验结果表明本文所提的方法对凋亡蛋白质的亚细胞位置预测性能显著地超过了其他已有的方法。为了给更多的研究人员提供预测服务,本文构建了在线预测网站。(2)CL317凋亡蛋白数据集存在较为严重的不均衡分布问题。在机器学习领域的以往的研究表明,直接应用传统的机器学习算法往往会导致偏向多数类,从而导致在少数类上分类性能不佳。为了解决这一问题,本文构建了一种新的凋亡蛋白亚细胞定位预测器GOIL-Apo,将随机欠采样技术与多类支持向量机相结合提出了欠采样SVMs集成分类器以解决CL317数据集存在的不平衡问题,同时通过构建GO向量子空间的方法避免使用所有GO术语所带来的维度灾难问题。实验结果表明解决不平衡问题能有效地提升预测效果,而且预测性能显著地超过了其他已有的方法。(3)以往研究人员只专注于单定位点的凋亡蛋白质亚细胞位置预测而忽略了多位点的凋亡蛋白,本文更进一步研究多位点的凋亡蛋白亚细胞位置预测,构建了一个包含多亚细胞位置的凋亡蛋白数据集,提出了一种新的利用标记相关特征的多标记算法。实验结果表明,通过选取与每个位置最相关的特征,能够很好地建模蛋白质的多位置特性,并且取得了很好的性能。本文研究是该领域的第一个考虑多位置细胞凋亡蛋白的工作,为多位置细胞凋亡蛋白预测研究提供了重要的参考价值。
[Abstract]:Apoptosis, or programmed cell death, is the final stage of cell life and an important component of many biological processes, and plays an important role in maintaining the balance of biological tissues. Apoptosis proteins play a key role in the mechanism of programmed cell death. Obtaining the information of the subcellular location of apoptotic proteins helps us to understand the mechanism of apoptosis. With the exponential growth of protein, the method of biological experiment annotation can not meet the needs of researchers. More and more researchers choose the method based on machine learning to predict the subcellular localization of protein. In this paper, we mainly use machine learning method to study the subcellular localization prediction of apoptotic protein, and accomplish the following three aspects: (1) aiming at the problem that the feature extraction method based on sequence information can not further improve the prediction effect, In this paper, go annotation information of proteins and their homologous proteins is used to represent proteins instead of sequence information. The experimental results show that the proposed method can predict the subcellular location of apoptotic proteins significantly better than other existing methods. In order to provide prediction services for more researchers, an online prediction website is constructed. (2) the CL317 apoptotic protein data set has a serious problem of uneven distribution. Previous studies in the field of machine learning have shown that direct application of traditional machine learning algorithms often leads to skewed majority classes and poor classification performance on a few classes. In order to solve this problem, In this paper, a new apoptotic protein subcellular localization predictor (GOIL-Apo) is constructed. Combining random under-sampling technique with multi-class support vector machines, an integrated classifier for under-sampled SVMs is proposed to solve the unbalanced problem in CL317 data set. At the same time, the dimensionality disaster caused by the use of all go terms is avoided by constructing go quantum space. The experimental results show that solving the imbalance problem can effectively improve the prediction effect. Moreover, the predictive performance was significantly better than other existing methods. (3) previously, researchers only focused on the location prediction of apoptotic protein subcells at single locus and neglected the multilocus apoptotic protein. In this paper, we further study the multilocus prediction of apoptotic protein subcellular location, construct a multilocus of apoptotic protein data set, and propose a new multi-marker algorithm based on the characteristics of marker correlation. The experimental results show that the multi-position characteristics of proteins can be well modeled by selecting the features most relevant to each location, and good performance has been achieved. This paper is the first work in this field to consider multisite apoptotic proteins, which provides an important reference value for the prediction of multisite apoptotic proteins.
【学位授予单位】:郑州轻工业学院
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q25;TP181

【参考文献】

相关期刊论文 前6条

1 程昔恩;吴志诚;;一种新的蛋白质亚细胞定位预测方法[J];计算机工程与应用;2012年06期

2 赵禹;赵巨东;姚龙;;用离散增量结合支持向量机方法预测蛋白质亚细胞定位[J];生物信息学;2010年03期

3 张松;夏学峰;沈金城;孙之荣;;基于序列保守性和蛋白质相互作用的真核蛋白质亚细胞定位预测[J];生物化学与生物物理进展;2008年05期

4 李凤敏;李前忠;林昊;;基于离散增量和协变判别函数识别蛋白质亚核定位[J];内蒙古大学学报(自然科学版);2008年01期

5 李超;伏圣博;刘华玲;马欣荣;;细胞凋亡研究进展[J];世界科技研究与发展;2007年03期

6 陈颖丽,李前忠;用离散量方法预测细胞凋亡蛋白的亚细胞位置[J];内蒙古大学学报(自然科学版);2004年04期



本文编号:2087600

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2087600.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户814aa***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com