当前位置:主页 > 科技论文 > 自动化论文 >

基于SVM和蛋白功能注释的蛋白质相互作用关系预测方法研究

发布时间:2018-03-13 21:25

  本文选题:蛋白-蛋白互作关系 切入点:SVM 出处:《吉林大学》2017年硕士论文 论文类型:学位论文


【摘要】:两个或更多个以上的蛋白分子复合物通过理化方式逐渐形成蛋白复合体(protein complex)的过程就是PPI互作关系(protein-protein interaction,PPI)的过程。本文研究发现,目前PPI数据中包含的蛋白质个数太少不能满足生命科学实际应用的需求,如常见的DIP等PPI互作数据库中只包含人类的9000多个PPI互作关系,而常用的基因表达数据个数在1~2万左右。例如常用的差异表达分析,发现很多差异表达基因并不包含在已知的PPI当中,因此大量蛋白质间的互作关系有待预测。现有的PPI数据主要是通过实验的手段获得,包括串联亲和纯化和酵母双杂交等技术,这些实验虽然可以获得较高的精度但是耗时太长,大大降低实验成本缩短耗时可用计算生物学的手段辅助预测PPI互作关系。总体说来基于机器学习的算法预测蛋白间互作关系效果比较不错,但是也有其自身的约束,主要体现在:第一、预测PPI互作关系应用的机器学习算法带有监督作用,训练测试PPI互作关系数据集是该算法不和缺少的,然而已知的和未知的PPI关系都比较少,尤其是确定不存在互作关系的蛋白质更是较少;第二、向量特征表示方法单一,或者是基于PPI氨酸序列的方法或者是基于基因共表达的方法等,没有考虑与PPI自身相关的其他生物学信息;第三、计算量比较大。针对以上问题,本文提出的解决方案:(1)针对SVM特征向量的表示的问题本文处理使用氨基酸AC值,还将GO,KEGG等蛋白功能注释数据引入特征向量的构建中,从而构建新的特征向量。(2)把基于实验得到的存在相互作用关系的PPI作为算法的正训练数据集,并通过网络搜索找到当前实验或者计算手段得到的不存在相互作用关系的PPI作为算法的负训练数据集,用正反两方面的PPI数据集训练测试SVM来预测PPI互作关系。(3)设计并实现基于计算手段辅助预测PPI的算法——PPI_SPFA算法,针对计算量大的问题本文提出使用两步计算的策略,即对那些存在相互作用关系可能性很小的PPI进行过滤,然后再进行预测的手段。PPI_SPFA算法与PPI_AC和i PPI-Esml等其他算法相比,其预测PPI相互作用关系的精度有了提高。(4)除了现有的DIP等PPI互作数据库中已包含的PPI互作关系外,SVM对剩余的所有PPI互作关系进行预测,最终构造出一个比较全的PPI互作关系网络。今后研究重点便是结合SVM和蛋白功能注释GO、KEGG等对PPI预测算法进行探究和创新,改善PPI预测算法的准确度和响应速度等。
[Abstract]:The process of two or more protein complexes gradually forming protein complex by physicochemical means is the process of protein-protein interaction (PPI). At present, the number of proteins contained in PPI data is too small to meet the needs of practical applications in life sciences. For example, the common PPI interaction databases such as DIP contain only more than 9,000 PPI interactions of human beings. The number of commonly used gene expression data is about 1 ~ 20,000. For example, the commonly used differential expression analysis shows that many differentially expressed genes are not included in known PPI. Therefore, the interactions between proteins need to be predicted. The existing PPI data are mainly obtained through experiments, including tandem affinity purification and yeast two-hybrid techniques, which can achieve high accuracy but take too long. In general, the algorithm based on machine learning is quite effective in predicting the interaction between proteins, but it also has its own constraints. The main results are as follows: first, the machine learning algorithm used to predict the PPI interaction relationship has the function of supervision, and the training and testing of the PPI interaction relation data set is the lack of the algorithm. However, there are few known and unknown PPI relations. In particular, there are fewer proteins that determine that there is no interaction; second, the method of vector feature representation is single, either based on PPI amino acid sequence or based on gene coexpression, etc. No other biological information related to PPI itself is taken into account; third, the amount of calculation is relatively large. In view of the above problem, the solution proposed in this paper is to solve the problem of representation of the SVM eigenvector. This paper deals with the use of Amino Acid AC value. In addition, the functional annotated data of proteins such as GogokEGG are introduced into the construction of feature vectors to construct a new feature vector. The experimental PPI with interaction relation is used as the positive training data set of the algorithm. And through the network search to find the current experimental or computational means of the non-interactive PPI as the algorithm of the negative training data set, Using the PPI data set training test SVM to predict the PPI interaction relation, we design and implement the algorithm based on computational means to assist the prediction of PPI. This paper proposes a two-step calculation strategy for the problem of large computational complexity. That is, filter the PPI with little possibility of interaction and then predict it. Compared with other algorithms, such as PPI_AC and I PPI-Esml, The precision of predicting PPI interaction relation has been improved. 4) in addition to the existing PPI interaction relation which has been included in the existing PPI interaction database, the PPI interaction relation is predicted for all the remaining PPI interactions. Finally, a complete PPI interaction network is constructed. The emphasis of future research is to explore and innovate the PPI prediction algorithm with SVM and protein function annotation, so as to improve the accuracy and response speed of PPI prediction algorithm.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q51;TP181

【参考文献】

相关期刊论文 前10条

1 尹稳;伏旭;李平;;蛋白质组学的应用研究进展[J];生物技术通报;2014年01期

2 王明强;武金霞;张玉红;韩凝;边红武;朱睦元;;蛋白质相互作用实验技术的最新进展[J];遗传;2013年11期

3 Luis Torgo;李洪成;陈道轮;吴立明;;数据挖掘与R语言[J];计算机教育;2013年09期

4 沈瑶瑶;严庆丰;;蛋白质相互作用研究进展[J];生命科学;2013年03期

5 刘勇;廖士中;;基于支持向量机泛化误差界的多核学习方法[J];武汉大学学报(理学版);2012年02期

6 王英超;党源;李晓艳;王兴龙;;蛋白质组学及其技术发展[J];生物技术通讯;2010年01期

7 孙平;张逢春;张影;;蛋白质芯片技术的研究及应用现状[J];北华大学学报(自然科学版);2009年02期

8 武君;沈秀张;林吉进;;蛋白质与蛋白质相互作用研究技术[J];中国分子心脏病学杂志;2008年02期

9 余鑫煜;许正平;;蛋白质相互作用数据库及其应用[J];中国生物化学与分子生物学报;2008年03期

10 何艳频;孙爱峰;;Spearman等级相关系数计算公式及其相互关系的探讨[J];中国现代药物应用;2007年07期

相关博士学位论文 前2条

1 龚伟;基于信息熵和互信息的流域水文模型不确定性分析[D];清华大学;2012年

2 史明光;蛋白质相互作用预测方法的研究[D];中国科学技术大学;2009年



本文编号:1608187

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1608187.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户9306c***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com