高通量测序技术在MSM人群HIV传播网络研究中的应用
本文选题:人类免疫缺陷病毒 + 传播网络 ; 参考:《中国疾病预防控制中心》2017年硕士论文
【摘要】:对人类免疫缺陷病毒1型(HIV-1)传播网络的特征和传播网络的动态进行深入理解和研究,将有助于HIV-1的监测、干预和治疗。目前,HIV-1传播网络分析主要通过分析HIV-1聚合酶(plol)基因区序列来实现。主要是通过PCR产物直接测序获得序列,然后进行传播网络的分析。而对于HIV-1新近感染者传播网络特征的描述或者网络动态分析,HIV-1包膜蛋白(env)和衣壳蛋白(gag)基因片段由于其进化速率比pol基因片段快可能更适合用来分析传播网络。PCR直接测序所获得的信息量小,一般只能检测优势准种序列,不能在HIV-1准种水平上进一步推断HIV-1传播方向。而高通量测序技术能够检测个体内病毒的低水平变异,可提示传播方向。本研究将同时对一代测序和Hiseq高通量测序(以下简称Hiseq测序)的HIV-1 pol、gag和env三基因区序列进行传播子网络分析,并探讨应用价值。目的探讨HIV-1 gag、env基因区序列构建传播网络的参数;探讨HIV-1pol、gag、env基因区序列在传播子网络动态分析方面的应用价值;建立基于HIV-1pol、gag、env基因区序列准种群分析的Hiseq高通量测序方法;利用Hiseq高通量测序技术进行HIV-1 pol、gag、env基因区序列传播子网络动态分析。材料和方法1.研究对象采用单纯随机抽样的方法,从北京佑安医院相对封闭的MSM人群HIV-1新发感染队列中抽取了 2010年至2012年的MSM急性期感染者100人,所有受检者采样前均未接受抗病毒治疗。分离其外周血单核细胞(PBMC)和血浆(1.5mml/支),于-8℃分别保存备用。2、实验方法(1)从PBMC样本中提取DNA;(2)对DNA进行巢式PCR扩增目的片段;(3)对PCR产物直接测序,并进行基因亚型分析;(4)对pol、gag、env基因区序列进行传播子网络动态分析;(5)针对Hiseq测序设计pol、gag、env基因区引物、优化反应条件,对DNA进行巢式PCR扩增目的片段;(6)PCR产物纯化后,构建DNA文库,然后进行Hiseq测序。(7)对Hiseq测序数据进行进行初步处理,对HIV准种群序列进行基因离散率、系统进化分析,探究其在HIV传播子网络动态分析中的应用价值。结果1、HIV-1毒株亚型和流行状况本研究获得的 HIV-1 基因亚型有 CRF01_AE、CRF07_BC、B、B'、CRF55_01B、CRF65_cpx 和未知亚型,所占比例依次为:42.7%(41/96)、25.0% (24/96)、15.6%(15/96)、1.0%(1/96)、1.0%(1/96)、3.1%(3/96)和 11.6%(11/96)。其中 CRF55_O1B 和 CRF65_cpx亚型在北京首次出现。2、一代测序所获pol、gag、env三区序列用于HIV-1传播网络的探究对70份样本的pol序列进行传播网络分析,网络包含5个传播簇,由11个节点和7条边组成,成簇率为15.71%(11/70)。Fisher检验显示,HIV-1亚型、年龄、教育、婚姻状况和CD4+T细胞计数等影响因素中的样本关联性差异无统计学意义(Psub=0.2058,Page=0.8652, Pedu=1.000, Pmar=1.0000, PCD4=0.7568)。固定自展值(Bootstrap) ≥90%探讨gag、env基因区序列用于传播网络分析的簇内基因距离(genetic distance,GD),Fisher检验显示,gag基因区序列当GD分别小于或等于0.5%、1.5%、2.5%、3.5%时,样本关联性与pol基因区序列差异无统计学意义(Pgag0.5=0.0257、Pgag1 5=0.7083、Pgag2.5=0.0876、Pgag3.5=0.0156,αg=0.01),env基因区序列当 GD 分别小于或等于 0.5%、1.5%、2.5%、3.5%、4.5%时样本关联性与GD≤1.5%和自展值≥90%时的pol基因区序列差异无统计学意义(Penv0.5=0.0257、Penv1.5=0.0811、Penv2.5=0.3824、Penv3.5=0.3749、Penv3.5=0.0160,αe=0.0083)。对pol、gag、env三区序列不同GD值下样本关联性进行比较分析,Fisher检验显示,GD≤1.5%配以自展值≥90%中,gag、env两个基因区序列在序列样本关联性方面差异有统计学意义(Pge1.5=0.0125) ; GD≤3.0%配以自展值≥90%中,pol与gag、pol与env两个基因区序列在序列样本关联性方面差异有统计学意义(Ppg3.0=0.0008,Ppe3.00.0001) ; GD4.5%配以自展值≥90%中,pol与 env、gag与env两个基因区序列在序列样本关联性方面差异有统计学意义(Ppe4.50.0001,Pge4.5=0.0005)。对33份样本的pol、gag、env序列进行动态传播子网络分析,分别提示了在HIV-1传播中起重要作用的两个样本(16014、16035)、三个样本(16014、16017、16035)和四个样本(16014、16017、16064、16035),同时提示了可能的传播路径。3、Hiseq测序用于HIV-1传播网络的方法学建立以env基因区序列为代表探索Hiseq测序适用于传播网络研究的准种分析数量。选取每份样品中最优势的前5、10、20准种序列(分别标示为A、B、C组),三组的样品间平均基因离散率差异无统计学意义(P=0.5889),经系统进化树分析,结果表明A组的结果与B、C两组是一致的。因此,本研究综合考虑后,决定选取前5个准种群序列进行传播网络分析。4、Hiseq测序所获pol、gag、env三区序列用于HIV-1传播网络的探究对CRF01_AE亚型pol、gaag、env三区测序成功的40、42、42份样本进行HIV-1传播网络分析,pol基因区序列的传播子网络动态分析分别提示共获得10个子网络,发现7个在 HIV-1 传播中起重要作用的样本(16001、16003、16014、16029、16082、16088、16097)和3个主要路径。gag基因区序列共获得16个子网络,发现10个在HIV-1传播中起重要作用的样本(16004、16011、16014、16032、16052、16056、16061、16082、16089、16097)和3个主要路径。env基因区序列共获得34个子网络,发现13个在HIV-1传播中起重要作用的样本(16007、16011、16018、16032、16056、16060、16061、16064、16070、16082、16085、16088、16092)和 3 个主要路径。分析 CRF07_BC 亚型pol、gag、env三区测序成功的25、29、29份样本,pol、gag、env三区序列分别发现在HIV-1传播中起重要作用的 2 个样本(16016、16023)、6 个样本(16024、16038、16054、16065、16075、16090)、10 个样本(16019、16028、16031、16038、16050、16065、16067、16069、16090、16098)和一个路径,同时提示了可能的传播路径。结论1、固定自展值≥90%,获得HIV-1gag、env基因区序列用于传播网络分析的GD值,分别为 GD≤3.5%、GD≤4.5%。2、传播子网络动态分析结果显示,一代测序三区序列获得的样本关联信息由高到低依次为env、gag、pol基因区序列。3、成功建立了基于Hiseq高通量测序技术的HIV-1 pol、gag、env基因区序列准种群分析方法。4、Hiseq高通量测序的结果使传播子网络分析结果更精准更明确,有助于了解HIV-1传播子网络的动态过程。5、Hiseq测序技术操作较简便,在HIV-1传播网络分析中具有一定的应用价值。
[Abstract]:The in-depth understanding and research of the characteristics of the human immunodeficiency virus 1 (HIV-1) transmission network and the dynamics of the transmission network will contribute to the monitoring, intervention and treatment of HIV-1. At present, the analysis of HIV-1 transmission network is mainly realized by analyzing the sequence of the HIV-1 polymerase (plol) gene region. The sequence is mainly obtained by direct sequencing of the PCR products. After the analysis of the propagation network, and for the description of the transmission network characteristics of the newly infected HIV-1 or the dynamic analysis of the network, the HIV-1 envelope protein (Env) and the capsid protein (GAG) gene fragment may be more suitable for the analysis of the information obtained by the direct sequencing of the transmission network.PCR because of its faster evolution rate than the pol gene fragment. The sequence of predominant quasispecies can not be used to further infer the direction of HIV-1 propagation at the HIV-1 quasi species level. High flux sequencing technology can detect the low level variation of the virus in the individual, and can indicate the direction of transmission. This study will simultaneously treat the HIV-1 pol, gag and env three gene regions of the first generation sequencing and Hiseq high throughput sequencing (following simple Hiseq sequencing). The sequence carries on the analysis of the propagation subnetwork and discusses the application value. Objective to explore the parameters of the HIV-1 gag and env gene region sequence to construct the transmission network; explore the application value of the HIV-1pol, gag, env gene region sequence in the dynamic analysis of the propagation subnetwork; establish the Hiseq high throughput sequencing party based on the quasi population analysis of the HIV-1pol, gag, env gene region sequence. Method: using Hiseq high throughput sequencing technology to carry out dynamic analysis of HIV-1 pol, gag, env gene region sequence transmission subnetwork. Materials and methods 1. subjects used a simple random sampling method to extract 100 people from 2010 to 2012 of MSM acute infection from the relatively closed MSM population of Beijing you an hospital. Antiviral treatment was not accepted before the sample was sampled. The peripheral blood mononuclear cells (PBMC) and plasma (1.5mml/ branch) were isolated and the standby.2 was preserved at -8 C, and the experimental method (1) extracted DNA from the PBMC samples; (2) the DNA was amplified by nested PCR amplification, and (3) the PCR products were directly sequenced and the gene subtype analysis was carried out; (4) pol, gag, env genes (4) The region sequence carries on the dynamic analysis of the propagation subnetwork; (5) to design pol, gag, env gene region primers for Hiseq sequencing, optimize the reaction conditions, carry out nested PCR amplification for DNA, and (6) after the purification of the PCR product, the DNA library is constructed, and then Hiseq sequencing is carried out. (7) the Hiseq sequencing data are preliminarily processed to carry out genes for HIV quasi population sequence. Discrete rate, phylogenetic analysis, explore its application value in the dynamic analysis of HIV propagation subnetwork. Results 1, the subtypes of HIV-1 strains and epidemic status of HIV-1 gene subtypes are CRF01_AE, CRF07_BC, B, B', CRF55_01B, CRF65_cpx and unknown subtypes, which are followed by: 42.7% (41/96), 25% (24/96), 15.6% (15/96), 1% 96), 1% (1/96), 3.1% (3/96) and 11.6% (11/96). In which CRF55_O1B and CRF65_cpx subtypes appear for the first time in Beijing. The sequence of pol, gag, env three is used for the HIV-1 propagation network to explore the pol sequence of 70 samples. The network contains 5 propagation clusters consisting of 11 nodes and 7 edges, and the clustering rate is 15.7 1% (11/70).Fisher test showed that there was no statistical difference in the correlation between HIV-1 subtypes, age, education, marital status and CD4+T cell count (Psub=0.2058, Page=0.8652, Pedu=1.000, Pmar=1.0000, PCD4=0.7568). The fixed self spreading value (Bootstrap) > 90% explored gag, and the env gene region sequence was used for the transmission of network analysis. Genetic distance, GD, and Fisher test showed that when GD was less than or equal to or equal to 0.5%, 1.5%, 2.5%, 3.5%, there was no statistical difference between the correlation of the gag gene and the sequence of the pol gene region (Pgag0.5=0.0257, Pgag1 5=0.7083, Pgag2.5=0.0876, Pgag3.5=0.0156, and alpha). 0.5%, 1.5%, 1.5%, 2.5%, 3.5%, 4.5%, there was no statistical difference between the correlation of the sample and the pol gene region sequence of GD < 1.5% and the self spreading value > 90% (Penv0.5=0.0257, Penv1.5=0.0811, Penv2.5=0.3824, Penv3.5=0.3749, Penv3.5=0.0160, alpha e=0.0083). The results showed that GD < 1.5% was equal to the self spreading value more than 90%, and the sequence of two gene regions of gag and env was statistically significant (Pge1.5=0.0125), and GD < 3% was matched with the self spreading value more than 90%. The difference between pol and gag, pol and env two gene sequences was statistically significant (Ppg3.0=0.0008, Ppe3.00.0001). There were significant differences in the correlation between pol and env, gag and env sequences in the sequence samples (Ppe4.50.0001, Pge4.5=0.0005) with the self spreading value of more than 90% (Ppe4.50.0001, Pge4.5=0.0005). The dynamic propagation subnetwork analysis of pol, gag and env sequences of 33 samples showed that two samples played an important role in the HIV-1 propagation (16014160). 35), three samples (160141601716035) and four samples (16014160171606416035), indicating the possible propagation path.3. The method of Hiseq sequencing for the HIV-1 transmission network is to establish the env gene region sequence as the representative of the quasi species analysis suitable for the study of the transmission network for the exploration of the Hiseq sequencing. 5,10,20 quasi species sequence (labeled A, B, C group respectively), the difference of average gene discrepancy between the three groups was not statistically significant (P=0.5889). The results of phylogenetic tree analysis showed that the results of group A were in agreement with the group of B and C two. Therefore, after comprehensive consideration, the first 5 quasi population sequences were determined to be transmitted network analysis.4, Hiseq sequencing. The pol, gag, env three region sequence was used for the HIV-1 transmission network to investigate the HIV-1 propagation network analysis on the 40,42,42 samples of the CRF01_AE subtype pol, GAAG, env three sequence, and the propagation subnetwork dynamic analysis of the pol gene region sequence showed that 10 sub networks were obtained, and 7 samples (160) were found to play an important role in the HIV-1 propagation. 16 sub networks were obtained from the sequence of 3 main path.Gag gene regions, and 10 samples (16004160111601416032160521605616061160821608916097) which played an important role in the transmission of HIV-1 and the sequence of the 3 main path.Env gene regions were found to have 34 sub networks, and 13 were found in the transmission of HIV-1. Samples of important roles (16007160111601816032160561606016061160641607016082160851608816092) and 3 main paths were used to analyze the successful 25,29,29 samples of the CRF07_BC subtype pol, gag and env three sequences. The pol, gag, env three region sequences found 2 samples (1601616023) that played an important role in the transmission of HIV-1, respectively. 6 samples (160241603816054160651607516090), 10 samples (16019160281603116038160501606516067160691609016098) and one path were used to indicate the possible propagation path. Conclusion 1, the fixed self spreading value is more than 90%, and HIV-1gag is obtained. The GD value of the env gene region sequence is used for the transmission network network analysis, which is GD < 3.5%, GD < 4.5, respectively. %.2, the dynamic analysis of the propagation subnetwork shows that the sample association information obtained by the sequence of three sequences of the first generation sequence is from high to low to Env, gag, and pol gene region sequence.3. The HIV-1 pol based on Hiseq high throughput sequencing technology is successfully established, the gag, env gene region sequence quasi population segregation method is.4, and the result of high throughput sequencing of Hiseq is the propagation subnetwork. The analysis results are more accurate and clearer. It is helpful to understand the dynamic process of the HIV-1 propagation subnetwork (.5). The Hiseq sequencing technology is more convenient and has a certain application value in the HIV-1 transmission network analysis.
【学位授予单位】:中国疾病预防控制中心
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R512.91
【相似文献】
相关期刊论文 前10条
1 刘小凤;罗勇;傅俊英;;DNA测序技术的专利计量研究[J];现代生物医学进展;2010年06期
2 傅俊英;赵蕴华;;DNA测序技术领域的相关政府投入分析[J];现代生物医学进展;2012年05期
3 ;低成本的DNA测序技术将引领医疗保健的变革[J];科技创业;2010年06期
4 ;第三代测序技术简介[J];生物医学工程与临床;2011年02期
5 余馨;刘启刚;王明蓉;;高通量DNA测序技术在抗体新药研发中的应用[J];药学学报;2012年03期
6 韦贵将;邹秉杰;陈之遥;宋沁馨;李佑志;周国华;;新一代测序技术的研究进展[J];现代生物医学进展;2012年19期
7 刘英华;陈瑛;;高通量测序技术的最新研究进展[J];中国妇幼保健;2013年12期
8 蔡晓静;朱煌;孔繁琪;冯婕妤;赵立全;牛宗镇;;DNA测序技术的进展和挑战[J];现代生物医学进展;2013年20期
9 刘莉扬;张学工;;高通量测序技术在临床医学中的应用进展[J];医学综述;2013年16期
10 刘燕;;新一代测序准备好了吗[J];IT经理世界;2013年17期
相关会议论文 前10条
1 何彪;涂长春;;下一代测序技术的应用及展望[A];中国畜牧兽医学会家畜传染病学分会第八届全国会员代表大会暨第十五次学术研讨会论文集[C];2013年
2 李晓峰;刘公社;;利用454测序技术大规模挖掘羊草抗逆转录因子的研究[A];中国草学会牧草育种委员会第七届代表大会论文集[C];2009年
3 张卉;秦利涛;吴东;王红丹;廖世秀;;新一代测序技术在产前诊断中的应用[A];第九届全国遗传病诊断与产前诊断学术交流会暨产前诊断和医学遗传学新技术研讨会论文集[C];2014年
4 倪培相;;新一代高通量测序技术在微生物基因组学研究中的应用[A];2010年中国科学院微生物研究所博士后学术年会暨第二届博谊论坛论文摘要集[C];2011年
5 王楷[,
本文编号:1952961
本文链接:https://www.wllwen.com/yixuelunwen/nfm/1952961.html