蛋白质功能标注中噪声识别模型研究
发布时间:2018-11-19 17:31
【摘要】:蛋白质是最主要的生命活动过程的载体,执行着生物体内各种重要功能。对蛋白质功能进行自动标注是生物信息学领域的关键问题,也是后基因时代的核心问题之一。准确地标注蛋白质功能,对疾病机理分析与调控、新药品研发、农作物促产、生物能源开发等研究领域都有着极大的促进作用。然而,蛋白质功能标注信息来源广泛,噪声标注信息不可避免地被引入。这些噪声功能标注会误导蛋白质相关功能的分析与应用,降低后续蛋白质功能的预测精度。已有蛋白质功能预测研究更关注预测功能信息完全未知蛋白质的功能和蛋白质的缺失功能,极少关注蛋白质噪声功能的识别。本文针对蛋白质噪声功能标注识别这一问题进行研究,其主要工作如下:(1)提出了一种基于语义相似度和分类相似度的蛋白质噪声功能识别算法(NoisyGOA)。该方法首先计算蛋白质之间的语义相似度和基因本体(Gene Ontology,GO)功能标签之间的分类相似度,然后计算一个蛋白质的每个功能标注与它语义近邻蛋白质的功能标注最大分类相似度之和,最后选择与这些近邻蛋白质具有最小分类相似度的功能标注为该蛋白质的噪声功能标注。在酵母菌,人类和拟南芥这3个模式生物的模拟噪声数据和真实噪声数据集上都显示了该方法在噪声功能标注识别方面的有效性与优越性。NoisyGOA不仅展示了蛋白质噪声功能的可识别性,而且表明了语义相似度和分类相似度对蛋白质噪声功能识别的作用。(2)由于NoisyGOA在计算语义相似度的过程中易受蛋白质已有噪声功能标注的影响,并且没有功能标注之间的差异性,本文提出另一种基于证据属性加权和稀疏表示的蛋白质噪声功能识别方法(NoGOA)。NoGOA首先用一个蛋白质-功能标签关联矩阵存储蛋白质功能标注信息,利用稀疏表示来计算蛋白质之间的语义相似度,并通过蛋白质的语义近邻对蛋白质的功能标注信息投票来初步识别该蛋白质的噪声功能;其次,NoGOA按不同的证据属性,对过去时期的蛋白质噪声功能标注进行统计和概率预估,在关联矩阵上,根据不同证据属性的噪声概率,对功能标注分别加权,再利用功能标签间层次结构关系向上传播权重;最后通过整合基于语义相似度的初步识别结果和加权的蛋白质-功能关联矩阵的结果识别蛋白质的噪声功能标注。在酵母菌,人类和拟南芥这3个模式生物上的实验结果表明,与现有算法相比,NoGOA能更准确识别蛋白质噪声功能。另外,为了验证NoGOA识别蛋白质噪声功能的效果,我们剔除了NoGOA识别的噪声功能信息,在此基础上进行蛋白质功能预测。实验结果展示,该方法能够提升现有蛋白质功能预测算法的精度。
[Abstract]:Protein is the most important carrier of life process, which performs various important functions in organism. Automatic labeling of protein functions is a key issue in bioinformatics and one of the core issues in the post-gene era. The accurate labeling of protein function plays a significant role in the research fields of disease mechanism analysis and regulation, new drug development, crop production promotion, bioenergy development and so on. However, the information of protein functional labeling comes from a wide range of sources, and noise tagging information is inevitably introduced. These noise function labeling can mislead the analysis and application of protein-related functions and reduce the prediction accuracy of subsequent protein functions. The research of protein function prediction has paid more attention to the function of the completely unknown protein and the missing function of the protein, and paid little attention to the recognition of the noise function of the protein. The main work of this paper is as follows: (1) A protein noise recognition algorithm (NoisyGOA). Based on semantic similarity and classification similarity is proposed. The method first calculates the semantic similarity between proteins and the classification similarity between functional tags of gene ontology (Gene Ontology,GO). Then the sum of the maximum classification similarity between each functional label of a protein and its semantic nearest neighbor protein is calculated. Finally, the function with minimal classification similarity with these adjacent proteins is selected as the noise function tagging of the protein. In yeast, Both the simulated noise data of human and Arabidopsis model organisms and the real noise datasets show the effectiveness and superiority of this method in the recognition of noise function. NoisyGOA not only shows the recognizability of protein noise function, but also shows the effectiveness of the proposed method. It also shows the effect of semantic similarity and classification similarity on protein noise function recognition. (2) NoisyGOA is easily affected by the existing noise function tagging in the process of calculating semantic similarity. And there is no difference between functional tagging, In this paper, another method of protein noise function recognition based on evidential attribute weighted and sparse representation is proposed. Firstly, a protein-function label correlation matrix is used to store protein function tagging information. The semantic similarity between proteins is calculated by sparse representation, and the noise function of protein is preliminarily recognized by voting on the function tagging information of protein by the semantic nearest neighbor of protein. Secondly, according to different evidential attributes, NoGOA makes statistical and probabilistic estimation of protein noise function tagging in the past period. In the correlation matrix, according to the noise probability of different evidential attributes, the function tagging is weighted separately. Secondly, using the hierarchical structure relationship between functional labels, the weight of transmission power is high. Finally, the noise function labeling of proteins is recognized by integrating the preliminary recognition results based on semantic similarity and the results of weighted protein-functional correlation matrix. The results of experiments on yeast, human and Arabidopsis show that NoGOA can recognize the noise function of protein more accurately than the existing algorithms. In addition, in order to verify the effect of NoGOA recognition on protein noise function, we eliminate the noise function information from NoGOA recognition, and predict the protein function on this basis. Experimental results show that this method can improve the accuracy of existing protein function prediction algorithms.
【学位授予单位】:西南大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q51;Q811.4
本文编号:2342956
[Abstract]:Protein is the most important carrier of life process, which performs various important functions in organism. Automatic labeling of protein functions is a key issue in bioinformatics and one of the core issues in the post-gene era. The accurate labeling of protein function plays a significant role in the research fields of disease mechanism analysis and regulation, new drug development, crop production promotion, bioenergy development and so on. However, the information of protein functional labeling comes from a wide range of sources, and noise tagging information is inevitably introduced. These noise function labeling can mislead the analysis and application of protein-related functions and reduce the prediction accuracy of subsequent protein functions. The research of protein function prediction has paid more attention to the function of the completely unknown protein and the missing function of the protein, and paid little attention to the recognition of the noise function of the protein. The main work of this paper is as follows: (1) A protein noise recognition algorithm (NoisyGOA). Based on semantic similarity and classification similarity is proposed. The method first calculates the semantic similarity between proteins and the classification similarity between functional tags of gene ontology (Gene Ontology,GO). Then the sum of the maximum classification similarity between each functional label of a protein and its semantic nearest neighbor protein is calculated. Finally, the function with minimal classification similarity with these adjacent proteins is selected as the noise function tagging of the protein. In yeast, Both the simulated noise data of human and Arabidopsis model organisms and the real noise datasets show the effectiveness and superiority of this method in the recognition of noise function. NoisyGOA not only shows the recognizability of protein noise function, but also shows the effectiveness of the proposed method. It also shows the effect of semantic similarity and classification similarity on protein noise function recognition. (2) NoisyGOA is easily affected by the existing noise function tagging in the process of calculating semantic similarity. And there is no difference between functional tagging, In this paper, another method of protein noise function recognition based on evidential attribute weighted and sparse representation is proposed. Firstly, a protein-function label correlation matrix is used to store protein function tagging information. The semantic similarity between proteins is calculated by sparse representation, and the noise function of protein is preliminarily recognized by voting on the function tagging information of protein by the semantic nearest neighbor of protein. Secondly, according to different evidential attributes, NoGOA makes statistical and probabilistic estimation of protein noise function tagging in the past period. In the correlation matrix, according to the noise probability of different evidential attributes, the function tagging is weighted separately. Secondly, using the hierarchical structure relationship between functional labels, the weight of transmission power is high. Finally, the noise function labeling of proteins is recognized by integrating the preliminary recognition results based on semantic similarity and the results of weighted protein-functional correlation matrix. The results of experiments on yeast, human and Arabidopsis show that NoGOA can recognize the noise function of protein more accurately than the existing algorithms. In addition, in order to verify the effect of NoGOA recognition on protein noise function, we eliminate the noise function information from NoGOA recognition, and predict the protein function on this basis. Experimental results show that this method can improve the accuracy of existing protein function prediction algorithms.
【学位授予单位】:西南大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q51;Q811.4
【参考文献】
相关期刊论文 前1条
1 傅广垣;余国先;王峻;郭茂祖;;基于正负样例的蛋白质功能预测[J];计算机研究与发展;2016年08期
相关博士学位论文 前1条
1 施绍萍;基于支持向量机的蛋白质功能预测新方法研究[D];南昌大学;2012年
,本文编号:2342956
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/2342956.html