基于概率图模型HMM的蛋白质二级结构预测

发布时间：2018-07-07 19:52

本文选题：概率统计 + HMM　；参考：《河北科技大学》2017年硕士论文

【摘要】：蛋白质结构与蛋白质功能密切相关,而蛋白质二级结构又是其他更高阶空间结构形成的基础,因此,蛋白质二级结构预测成为生物信息学研究的热点。蛋白质二级结构预测是根据已知二级结构的蛋白质数据建立氨基酸序列和二级结构间的关系模型,进而通过模型来预测未知氨基酸序列的二级结构。隐马尔可夫模型(Hidden Markov Model,HMM)是一种概率统计模型,一些国内外学者将其应用到蛋白质二级结构预测问题上,收到了一定的效果。该文首先利用3-状态HMM和7-状态HMM对蛋白质二级结构进行预测,通过比较得出7-状态HMM的预测效果要优于3-状态HMM。对于7-状态HMM预测算法,又从结构状态和参数训练两方面提出了改进方案。一方面,考虑到7-状态HMM没有包含非二级结构的状态,故引入非二级结构的状态,进而构成8-状态HMM;另一方面,对于参数重估过程的第二种下溢情况,我们打破常规,并没有通过某种手段阻止其下溢,而是用最优重估一次参数的均值作为预测二级结构的模型参数。我们通过这两种改进方案在一定程度上提高了预测准确率。该文的研究表明,选取合适的蛋白质结构状态以及合适的训练集对提高蛋白质二级结构的预测准确率是比较重要的。
[Abstract]:Protein structure is closely related to protein function, and protein secondary structure is the basis for the formation of other higher-order spatial structures. Therefore, protein secondary structure prediction has become a hot topic in bioinformatics. The prediction of protein secondary structure is based on the protein data of known secondary structure to establish a relationship model between amino acid sequence and secondary structure, and then to predict the secondary structure of unknown amino acid sequence through the model. Hidden Markov Model (hmm) is a kind of probabilistic statistical model, which has been applied to protein secondary structure prediction by some scholars at home and abroad. In this paper, 3-state hmm and 7- state hmm are used to predict the secondary structure of protein. It is concluded that the prediction effect of 7-state hmm is better than that of 3-state hmm. For the 7-state hmm prediction algorithm, an improved scheme is proposed in terms of structural state and parameter training. On the one hand, considering that the 7- state hmm does not contain the state of non-secondary structure, so the non-secondary structure state is introduced to form 8-state HMMs, on the other hand, we break the rule for the second underflow of the parameter revaluation process. Instead of stopping the overflow by some means, the mean value of the optimal reestimation of the primary parameter is used as the model parameter to predict the secondary structure. We improve the prediction accuracy to some extent by these two improved schemes. The results show that it is important to select the appropriate protein structure state and the appropriate training set to improve the prediction accuracy of protein secondary structure.
【学位授予单位】：河北科技大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：Q51;O211.62

【参考文献】