基于血凝素蛋白序列的甲型流感病毒抗原性变异预测研究
发布时间:2018-01-15 21:12
本文关键词:基于血凝素蛋白序列的甲型流感病毒抗原性变异预测研究 出处:《浙江理工大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 抗原性变异 血凝素蛋白序列 矩阵填充 随机森林 替换矩阵
【摘要】:及时的鉴定新出现流感病毒的抗原性变异对于流感疫苗的设计、流感的监督以及人们的生命健康都是至关重要的。传统的实验方法(例如血凝抑制试验)虽然预测效果不错,但仍有不少缺点和不足:费时又费力,不能及时有效的对流感起到监控作用;有些实验无法顺利进行而导致我们获取的血清学数据比较稀疏(含有大量的缺失值);测量值存在人为和系统误差故而最终的血清学数据中有不少值过低。为了加速对流感病毒抗原性变异的预测及提升预测质量,基于流感病毒血凝素蛋白序列的生物信息学方法不断的被提出。本文通过提取血凝素蛋白的序列信息并结合对应血清学数据对甲型流感病毒的抗原性变异进行分析和预测,主要研究内容如下:1.综述了近几年国内外甲型流感病毒抗原性变异的预测研究进展,主要是针对血凝素蛋白序列的特征提取以及预测分类算法。常见的特征表示有二进制表示,按氨基酸物化性质分组提取等。采用的预测分类算法主要有矩阵填充,K近邻,支持向量机,逻辑斯蒂卡回归,套索算法等。2.提出了一种基于血凝素蛋白序列的联合随机森林算法(JRFR),用于直接预测甲型流感病毒的抗原性距离。我们的算法结合94种氨基酸替换矩阵及HA1对甲型流感病毒的抗原性距离进行预测,不仅提升了预测精度而且对新的病毒序列的抗原性变异有很好的预测效果。3.提出了一种基于血凝素蛋白序列的矩阵填充算法(BMCSI),用于填充和矫正原本过于稀疏的及含有很多不稳定值的血凝素抑制试验测试数据,从而得以更精确的计算各病毒间的抗原性距离,并通过多维尺度分析(MDS)算法将其抗原性距离映射到二维空间,从而将甲型流感病毒抗原性距离可视化。我们的方法在1968-2003年数据上的预测精度比之前的研究提升了37%(RMSE=0.6586)。
[Abstract]:Timely identification of the antigenic variation of emerging influenza viruses for the design of influenza vaccines. Surveillance of influenza and the health of people are vital. Traditional experimental methods (such as hemagglutination inhibition tests), while good at predicting, still have many disadvantages and shortcomings: time-consuming and laborious. Unable to monitor influenza in a timely and effective manner; Some of the experiments were not carried out smoothly, which resulted in the sparse serological data (containing a large number of missing values). In order to accelerate the prediction of influenza virus antigenic variation and improve the quality of prediction, many of the final serological data are too low due to human and systematic errors. The bioinformatics method based on the hemagglutinin protein sequence of influenza virus has been proposed continuously. In this paper, the antigenicity variation of influenza A virus was analyzed by extracting the sequence information of hemagglutinin protein and combining the corresponding serological data. Analysis and prediction. The main research contents are as follows: 1. The research progress of antigenicity variation of influenza A virus at home and abroad in recent years is reviewed. Mainly for hemagglutinin protein sequence feature extraction and prediction classification algorithm. The common feature representation has binary representation. The prediction classification algorithms used mainly include matrix filling K nearest neighbor, support vector machine and logical Steka regression. A joint random forest algorithm based on hemagglutinin sequence (JRFR) is proposed. Our algorithm combined 94 amino acid substitution matrix and HA1 to predict the antigenicity distance of influenza A virus. Not only improve the prediction accuracy but also predict the antigenicity variation of new virus sequences. 3. A matrix filling algorithm based on hemagglutinin protein sequence (BMCSI) is proposed. It is used to fill and correct the test data of hemagglutinin inhibition test, which is too sparse and contains many unstable values, so that the antigenicity distance between viruses can be calculated more accurately. The antigenicity distance of MDS is mapped to two-dimensional space by multi-dimensional scale analysis (MDS) algorithm. Therefore, the antigenicity distance of influenza A virus is visualized. The prediction accuracy of our method in 1968-2003 data is 37% higher than previous research.
【学位授予单位】:浙江理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R373.13
【相似文献】
相关会议论文 前1条
1 张萃;;甲型H1N1流感病毒的变异性及其新疫苗的免疫性[A];第五届全国中医药免疫学术研讨会——暨环境·免疫与肿瘤防治综合交叉会议论文汇编[C];2009年
相关硕士学位论文 前1条
1 李显红;基于血凝素蛋白序列的甲型流感病毒抗原性变异预测研究[D];浙江理工大学;2017年
,本文编号:1430065
本文链接:https://www.wllwen.com/shoufeilunwen/mpalunwen/1430065.html
最近更新
教材专著