自联想神经网络算法在蛋白质结构取样空间中的应用
发布时间:2018-05-25 15:17
本文选题:同源建模 + 缺失值 ; 参考:《华北电力大学(北京)》2017年硕士论文
【摘要】:蛋白质结构预测是蛋白质结构和功能研究工作的重要组成部分,对蛋白质药物分子设计、生物制药等方面有重要的意义。若已知同源蛋白质家族中某些蛋白质的结构,就可以预测其他一些序列已知而结构未知的同源蛋白质结构。通过序列比对,能够将长度不等的序列通过插入空位变成等长序列,这些空位位置代表了相比对的序列是从相同的祖先通过插入和删除等操作的演化而来,进而反应了在生物进化过程中的变异,突变现象。空位的出现会对同源蛋白质建模的尺度和精度产生很大影响,因此对蛋白质序列比对中缺失值的研究具有重要意义。对蛋白质缺失数据的填充在之前已经通过一些方法得到了很好的实现,如最邻近算法,自组织神经网络算法。这两种方法对蛋白质缺失数据均给予了合理的填充,并且在平均探究尺度上从62.9%提升到82.7%,研究精度从1.65?提升到0.88?。但是由于蛋白质的结构空间复杂,对蛋白质取样空间预测的计算量非常庞大,这使得计算过程比较耗时。为此,我们希望在能够合理对蛋白质缺失值填充的前提下,提高计算的速度,减少计算量。本文以自联想神经网络(Autoassociative Neural Networks,AANN)的非线性主成分算法为基础,综合考虑到蛋白质取样空间构造复杂和蛋白质列数据库的增长速度,本文采用一种基于改进的逆非线性网络模型(Inverse NLPCA Model)来实现缺失值的填充和效率提升,并对该网络模型采用共轭梯度算法优化以更进一步加快计算效率。
[Abstract]:Protein structure prediction is an important part of protein structure and function research, which is of great significance in protein drug molecular design and biopharmaceutical. If the structure of some proteins in the homologous protein family is known, some other homologous protein structures with known sequences and unknown structures can be predicted. By sequence alignment, it is possible to convert sequences of varying lengths from inserted vacancies to equal-length sequences, which represent the evolution of pairs of sequences from the same ancestor through operations such as insertion and deletion. It also reflects the variation and mutation in the process of biological evolution. The occurrence of vacancies will have a great impact on the scale and accuracy of homologous protein modeling, so it is of great significance to study the missing values in protein sequence alignment. The filling of protein missing data has been implemented by some methods, such as nearest neighbor algorithm and self-organizing neural network algorithm. These two methods have given reasonable filling to the protein missing data, and the average inquiry scale has been raised from 62.9% to 82.7, and the precision of the research has been increased from 1.65? Rose to 0.88. However, because of the complexity of protein structure space, the calculation of protein sampling space prediction is very large, which makes the calculation process more time-consuming. Therefore, we hope to increase the speed of calculation and reduce the amount of calculation on the premise of reasonably filling the missing value of protein. Based on the nonlinear principal component algorithm of autoassociative Neural Networks, this paper considers the complexity of protein sampling space and the growth rate of protein sequence database. In this paper, an inverse NLPCA model based on the improved inverse NLPCA Model is used to fill the missing value and improve the efficiency. The conjugate gradient algorithm is used to optimize the network model to further accelerate the computational efficiency.
【学位授予单位】:华北电力大学(北京)
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q51;TP183
【参考文献】
相关期刊论文 前5条
1 韩榕生;吴国庆;张美玲;;一种有效扩大蛋白质同源建模尺度方法[J];河北科技师范学院学报;2013年03期
2 高晓红;;ART神经网络的发展与应用[J];电脑知识与技术(学术交流);2007年20期
3 殷志祥;蛋白质结构预测方法的研究进展[J];计算机工程与应用;2004年20期
4 黄向华;基于自联想神经网络的发动机控制系统传感器故障诊断与重构(英文)[J];Chinese Journal of Aeronautics;2004年01期
5 孔薇,杨杰;基于神经网络的非线性PCA方法[J];计算机仿真;2003年07期
,本文编号:1933682
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1933682.html