基于支持向量机的赖氨酸翻译后修饰位点预测方法研究
本文关键词: 生物信息学 蛋白质翻译后修饰 甲基化 支持向量机 多标签分类 出处:《大连理工大学》2016年博士论文 论文类型:学位论文
【摘要】:蛋白质翻译后修饰是对翻译后的蛋白质进行共价加工的过程,它在调控蛋白质构象变化、活性以及功能等方面发挥着重要作用。精确地识别翻译后修饰位点是深入探究翻译后修饰分子机制的关键。近十年来,基于机器学习的蛋白质翻译后修饰位点预测研究取得了快速发展,已经成为生物信息学领域的一个研究热点。本文根据蛋白质翻译后修饰位点预测的研究现状,从蛋白质序列角度出发,利用机器学习中的支持向量机(SVM)及其改进算法,对目前翻译后修饰位点预测中存在的几个问题进行研究,主要工作概括如下:1.建立了一个蛋白质赖氨酸甲基化位点及程度预测模型iLM-2L,以解决现有的赖氨酸甲基化位点预测方法的预测精度较低,且不具备甲基化程度预测功能的问题。首先,针对现有赖氨酸甲基化位点预测方法准确率较低的问题,将有效的k-spaced氨基酸对组成编码方法应用于甲基化位点预测模型的构建,提高了甲基化位点预测准确率。其次,针对现有甲基化预测方法忽略甲基化程度预测的问题,将甲基化程度预测建模为一个多标签学习问题并利用多标签SVM算法对其进行训练。仿真实验结果表明,iLM-2L的预测性能要优于现有的5个甲基化位点预测方法:MeMo、MASA、BPB-PPMS、PMeS以及iMethyl-PseAAC。此外,iLM-2L还能够有效地进行甲基化程度预测,弥补了现有预测方法不具备甲基化程度预测功能的不足。通过对最优的k-spaced氨基酸对组成特征的分析,给出了赖氨酸甲基化修饰位点周围的潜在序列模式偏向。最后,基于iLM-2L模型,构建了甲基化位点预测服务平台,为研究人员提供在线预测服务(http://123.206.31. 171/iLM 2L/)。2.建立了一个原核生物pupylation位点预测模型IMP-PUP。针对pupylation修饰位点数据较少而导致现有预测模型性能不佳的问题,提出了一个半监督自训练SVM算法作为IMP-PUP模型的核心分类算法。所提出的自训练SVM算法可以充分挖掘PupDB数据库中未带有修饰位点标注的pupylation蛋白所隐含的位点信息,扩充了可用于模型训练的修饰位点数据,进而提高了预测性能。该算法在迭代训练过程中引入一个最小距离准则设计置信度函数来抽取可信样本,克服了原始半监督自训练SVM算法在训练过程中容易过早出现误分类情况的不足。仿真验证结果表明,IMP-PUP模型的预测性能要优于其它3个现有预测器:GPS-PUP、iPUP和pbPUP。基于IMP-PUP模型,构建了相应的在线预测平台(http://123.206.31.171/IMP_PUP/).3.建立了一个赖氨酸phosphoglycerylation位点预测模型CKSAAP_PhoglySite。首先,针对phosphoglycerylation位点预测中正负训练样本不平衡且含有噪声的问题,提出了一个模糊SVM算法。所提出的模糊SVM算法在设计样本的模糊隶属度函数时,不仅考虑样本到其类中心距离,而且考虑样本周围的紧密程度,大大提升了算法处理噪声数据的能力,并通过赋予正、负类样本分别以较大、较小的惩罚因子,较好地克服了数据不平衡性对分类器的影响。其次,为了寻找有效的编码技术来提取phosphoglycerylation位点周围的序列特征,分析并比较了氨基酸组成、二进制编码、k-spaced氨基酸对组成、位置特异性得分矩阵和二级结构这5种常用的特征对模型预测效果的影响。最后,利用所提出的模糊SVM算法结合k-spaced氨基酸对组成特征对CKSAAP_PhoglySite模型进行构建。Jackknife测试结果表明,CKSAAP_PhoglySite模型的预测准确率比现有的预测工具Phogly-PseAAC提高了14.2%。基于CKSAAP_PhoglySite模型,构建了相应的在线预测服器(http://123.206.31.171/CKSAAP_PhoglySite/)。
[Abstract]:PTMs is a process of covalent processing of protein after translation, it changes in the regulation of protein conformation, plays an important role in the activity and function. Accurate identification of post-translational modification sites is a key molecular mechanism of modification after translation. In the past ten years, machine learning of post-translational modification based on site prediction has achieved rapid development, has become a hot research topic in the field of bioinformatics. In this paper, according to the current research status of site prediction of PTMs, starting from the angle of protein sequence, using support vector machine in machine learning (SVM) and its improved algorithm, research on the current problems of post-translational modification sites in the prediction, the main works are as follows: 1.. To establish a prediction model of iLM-2L amino acid protein lysine methylation sites and the degree to solve The existing prediction of lysine methylation sites with low prediction accuracy, and do not have the function of the degree of methylation prediction. Firstly, aiming at the lysine methylation prediction method of the problem of low accuracy, the effective composition of k-spaced amino acid encoding method should be used to establish the model of methylation site prediction and improve the prediction accuracy of methylation sites. Secondly, in view of the existing prediction methods ignore methylation methylation prediction problem, the methylation level prediction model for a multi label learning problem and using the multi label algorithm to train the SVM. Simulation results show that the prediction performance of iLM-2L to 5 methylation site is better than the existing prediction methods: MeMo, MASA, BPB-PPMS, PMeS and iMethyl-PseAAC. in addition, iLM-2L also can effectively predict the degree of methylation, make up the existing prediction methods not out The lack of preparation of predictive function of methylation level by k-spaced. The optimal analysis of amino acid composition characteristics, given lysine methylation potential sequence pattern modification sites around the bias. Finally, based on the iLM-2L model, constructed the methylation prediction service platform to provide online service for researchers (http://123.206.31. 171/iLM 2L/) the.2. establishes a prediction model for IMP-PUP. of prokaryotic pupylation sites pupylation modification sites less data problem caused by poor performance of the existing prediction models, puts forward 1.5 supervised self training SVM classification algorithm as the core algorithm of IMP-PUP model. The proposed algorithm can fully exploit the self training SVM site information modification sites labeled pupylation protein implied not with the PupDB database, the expansion can be used to modify the site data for training the model, and provided High prediction performance. The algorithm introduces a minimum distance criterion design confidence function to extract credible sample in the iterative training process, overcome the original semi supervised self training in the training process of SVM algorithm is prone to premature lack of misclassification. Simulation results show that the performance of the IMP-PUP model is better than the other 3 the current Predictor: GPS-PUP, iPUP and pbPUP. based on the IMP-PUP model, establishes the forecasting platform corresponding online (http://123.206.31.171/IMP_PUP/).3. established a lysine phosphoglycerylation locus CKSAAP_PhoglySite. prediction model for prediction of phosphoglycerylation site first, positive and negative training samples are not balanced and noise problems, put forward a fuzzy SVM algorithm of fuzzy SVM algorithm. The fuzzy membership function of the sample design, considering not only the sample to its class center distance From, and consider closely surrounding the samples, greatly enhance the ability of the algorithm to deal with noise data, and by giving positive and negative samples respectively with larger, smaller penalty factor, overcomes the influence of data imbalance on the classifier. Secondly, in order to seek the effective encoding technology to extract the sequence characteristics of phosphoglycerylation sites the analysis and comparison of amino acid composition, amino acid composition of binary encoding, k-spaced, the prediction effect of position specific scoring matrix and two level structure of the 5 kinds of characteristics influence the model. Finally, using the fuzzy SVM algorithm is proposed based on amino acid composition characteristics of k-spaced CKSAAP_PhoglySite model to construct the.Jackknife test results show that the CKSAAP_PhoglySite model prediction accuracy rate than the existing improved Phogly-PseAAC prediction tool based on 14.2%. CKSAAP_PhoglySit In the e model, the corresponding online predictive server (http://123.206.31.171/CKSAAP_PhoglySite/) is built.
【学位授予单位】:大连理工大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:Q51;TP18
【相似文献】
相关期刊论文 前10条
1 冯剑丰;王洪礼;李胜朋;;基于支持向量机的浮游植物密度预测研究[J];海洋环境科学;2007年05期
2 蔡佳;陈洪;;ν-支持向量机的收敛性[J];湖北大学学报(自然科学版);2005年04期
3 唐万梅;;基于灰色支持向量机的新型预测模型[J];系统工程学报;2006年04期
4 熊宇虹;温志渝;梁玉前;陈勤;张波;刘妤;向贤毅;;基于支持向量机的模型传递方法研究[J];光谱学与光谱分析;2007年01期
5 左林;;一个新的稀疏支持向量机模型[J];山西师范大学学报(自然科学版);2008年03期
6 赫英明;王汉杰;;支持向量机在积雪检测中的应用[J];南京气象学院学报;2009年01期
7 贾亮;王礼力;;支持向量机在上市公司财务预警模型中的应用[J];淮阴工学院学报;2010年02期
8 郎宇宁;蔺娟如;;基于支持向量机的多分类方法研究[J];中国西部科技;2010年17期
9 谢福民;;支持向量机理论研究[J];科技广场;2011年01期
10 王红蔚;席红旗;孔波;;一种新的半监督支持向量机[J];郑州大学学报(理学版);2012年03期
相关会议论文 前10条
1 余乐安;姚潇;;基于中心化支持向量机的信用风险评估模型[A];第六届(2011)中国管理学年会——商务智能分会场论文集[C];2011年
2 刘希玉;徐志敏;段会川;;基于支持向量机的创新分类器[A];山东省计算机学会2005年信息技术与信息化研讨会论文集(一)[C];2005年
3 史晓涛;刘建丽;骆玉荣;;一种抗噪音的支持向量机学习方法[A];全国第19届计算机技术与应用(CACIS)学术会议论文集(下册)[C];2008年
4 何琴淑;刘信恩;肖世富;;基于支持向量机的系统辨识方法研究及应用[A];中国力学大会——2013论文摘要集[C];2013年
5 刘骏;;基于支持向量机方法的衢州降雪模型[A];第五届长三角气象科技论坛论文集[C];2008年
6 王婷;胡秀珍;;基于组合向量的支持向量机方法预测膜蛋白类型[A];第十一次中国生物物理学术大会暨第九届全国会员代表大会摘要集[C];2009年
7 赵晶;高隽;张旭东;谢昭;;支持向量机综述[A];全国第十五届计算机科学与技术应用学术会议论文集[C];2003年
8 周星宇;王思元;;智能数学与支持向量机[A];2005年中国智能自动化会议论文集[C];2005年
9 颜根廷;马广富;朱良宽;宋斌;;一种鲁棒支持向量机算法[A];2006中国控制与决策学术年会论文集[C];2006年
10 侯澍e,
本文编号:1545033
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/1545033.html