当前位置:主页 > 医学论文 > 实验医学论文 >

基于质谱血清多肽组谱图的管理分析系统构建与应用研究

发布时间:2018-08-15 14:10
【摘要】: 在后基因组时代,随着人类和其他模式生物基因组测序的完成以及质谱仪器和方法取得的重要突破,蛋白质组学在基础研究和临床应用等方面取得了巨大进展。临床蛋白质组学是蛋白质组学新近出现的一个分支学科,它侧重于蛋白质组学技术在临床医学领域的应用研究,包括疾病预防、早期诊断和辅助治疗等方面。临床蛋白组学涉及多种数据类型,血清多肽组谱图(简称血肽图)是其中比较重要的一种,是基于非凝胶系统的临床蛋白质组学应用研究,其基本原理是通过基质辅助激光解吸电离飞行时间质谱(matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, MALDI-TOF/MS)或表面增强激光解吸电离飞行时间质谱( surface-enhanced laser desorption/ionization time-of-flight mass spectrometry, SELDI-TOF/MS)检测血清中多肽组的精确质量数,然后采用生物信息学方法处理获得的一种数据。通过比较疾病与健康对照血肽图的差异,人们可以发现疾病特异表达的蛋白或多肽,进而有助于在蛋白水平研究疾病的发生机制。 血肽图技术在生物标志物发现、疾病早期诊断和个性化治疗等领域有着广泛的应用前景。然而血肽图技术应用于临床研究过程中必须考虑下列一些因素。首先是样本选择对血肽图技术的影响,对于临床研究所需要收集的疾病患者和正常对照人群样本,要考虑到样本个体间差异和个体内差异,正常对照人群个体间差异包括年龄、性别、种族、家族史和疾病史等,疾病患者样本最好包含完整的疾病亚型,收集的信息要尽可能完整,以便满足构建数学模型和验证的需要。其次是样本收集对血肽图技术的影响,这属于分析前差异,包括样本收集、存储和运送过程中由于环境条件差异对样本所产生的影响,由于这些差异一般与疾病无关,有可能增加寻找与疾病相关的差异蛋白质或多肽的复杂性,最终影响血肽图分析的结果。最后是仪器分析的差异对血肽图技术的影响,血肽图技术需要的质谱仪器主要是MALDI-TOF/MS和SELDI-TOF/MS。由于质谱实验过程中存在多种影响因素,质谱产生的原始谱图数据包含了大量的噪音信号,必须进行预处理以去除干扰。 鉴于血肽图具有变量个数和样本数目均众多的特点,面对这样复杂的数据,只有通过生物信息学方法,才能识别出与疾病密切相关的一组多肽峰,发现血肽图中与疾病相关的特征信息。然而,现有的数据管理与分析工具已经无法满足当前的需要,而商业化软件由于价格昂贵,也在一定程度上制约了血肽图技术的广泛应用。为此,我们将临床蛋白质组学与生物信息学相结合,开发了一套基于质谱血清多肽组谱图的管理分析系统BioSunMS。该系统基于ECLIPSE插件架构,采用JAVA语言开发,具有易于发布及二次开发,界面友好,跨系统平台等特点,便于管理临床样本、质谱谱图和对质谱谱图进行预处理和建模分析,从而为相关研究人员方便快捷地开展疾病分类与分型研究提供帮助,最后,我们以基于肺癌患者血肽图的样本分类和分型研究为例说明BioSunMS的功能,具体内容如下。 1.血肽图数据库构建 血肽图数据库主要用来存放正常人以及多种肿瘤(包括肺癌、肝癌、乳腺癌、直肠癌、前列腺癌和白血病等)患者的血清多肽谱、样本及其临床相关信息。该数据库主要包含样品来源、诊断方法、样品处理过程、质谱检测方法、血清多肽质谱数据等内容。该数据库主要提供了下列重要功能:血清多肽图查询,通过该系统,用户可获得特定肿瘤的血肽图的标志谱峰及其对应的多肽序列;各种疾病血肽图数据的提交,通过此系统,研究人员可以将自己实验室收集的疾病血肽图数据,提交到本数据库中,从而丰富了数据库中的疾病种类;血肽图疾病信息分析,检测人员将临床获得的血肽图直接通过本数据库进行查询,从而得到疾病相关信息。 2.血肽图数据处理与分析的软件开发 为了快速准确地开展以血肽图数据为基础的肿瘤分类与分型研究,开发了血多肽数据处理与分析模块。数据处理模块可实现对获得的血肽图质谱数据实现质谱图展示、数据导入、导出、格式转化和预处理等功能。数据分析模块具有对预处理后的数据进行统计学分析,找到特征谱峰,建立血肽图模型,对盲样进行判别等功能,可实现快速、自动化发现生物标志物等相关分析。 3.基于血肽图数据的肿瘤分类与分型研究 以支持向量机(SVM)、主成分分析(PCA)、遗传算法(GA)、朴素贝叶斯方法(Na?ve Bayes)和偏最小二乘法(PLS)等常用的统计学及机器学习方法为工具,以血肽图数据库中的数据为基础,构建了基于血肽图数据的肿瘤分类与分型模块,并提供模型参数优化功能,便于相关人员开展肿瘤分类与分型研究工作。 4.肿瘤特征性血肽图模型建立 该研究是与国家仪器分析中心合作开展的。在前期工作中,国家仪器分析中心已经完成了1000例健康人群和2000多例肺癌、肝癌、乳腺癌、直肠癌、前列腺癌和白血病等肿瘤患者的血肽图高分辨质谱数据采集。在此基础上,运用BioSunMS系统对数据库中254例肺癌组以及257例正常对照组的血肽图进行分析。首先,我们以150例肺癌组样本和150例对照组样本的血肽图数据构建了训练集,剩余104例肺癌组样本和107例正常对照组样本的血肽图构建了测试集。通过t检验进行变量选择,以P0.005为标准,筛选出74个特征谱峰。以这些变量为基础,我们采用SVM方法构建了肺癌血肽图的分类模型,并用测试集进行了验证。对于测试集,分类准确度、敏感性和特异性分别是92.3%,96.3%,94.3%。通过上述分析,我们发现了一些肺癌特征质谱谱峰信息,并以这些谱峰信息为特征,构建了基于质谱血肽图的肺癌早期诊断模型,对肺癌的早期诊断研究进行了初步的探索。 综上所述,该研究构建了一个集质谱血清多肽组谱图的数据库管理和分析为一体的软件BioSunMS,并应用该系统对肺癌血肽图数据进行了初步分析,构建了肺癌血肽图早期诊断模型,为基于质谱血肽图的相关研究提供了生物信息学支持。
[Abstract]:In the post-genome era, great progress has been made in basic research and clinical application of proteomics with the completion of human and other model organism genome sequencing and important breakthroughs in mass spectrometry instruments and methods. Clinical proteomics involves a variety of data types. Serum polypeptide profiles (hemopeptide profiles) are one of the most important, and are based on non-gel systems in clinical proteomics applications. The basic principles of these proteomics are universal. Detection of blood by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/MS) or surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF/MS) The exact mass of the polypeptide group in the serum is then processed using bioinformatics. By comparing the differences between the peptide maps of disease and healthy controls, one can discover disease-specific proteins or polypeptides, and thus help to study the pathogenesis of disease at the protein level.
Hemopeptide mapping has broad application prospects in the fields of biomarker discovery, early diagnosis and individualized treatment. However, the following factors must be taken into account in the application of hemopeptide mapping in clinical research. First, the influence of sample selection on hemopeptide mapping is important for the patients and patients who need to be collected by clinical research institutes. Normal control group samples should take into account individual differences and individual differences. Individual differences between normal control groups include age, sex, race, family history and disease history. Disease patients'samples should contain complete disease subtypes, and the collected information should be as complete as possible to meet the needs of building mathematical models and validation. Secondly, the impact of sample collection on hemopeptide mapping technology is pre-analysis differences, including sample collection, storage and transportation due to environmental differences in the impact of samples, as these differences are generally not related to disease, may increase the complexity of finding disease-related differences in proteins or peptides, and ultimately affect blood Finally, the influence of instrumental analysis on hemopeptide mapping technology is discussed. MALDI-TOF/MS and SELDI-TOF/MS are the main mass spectrometers needed for hemopeptide mapping technology. Due to various factors in the process of mass spectrometry experiment, the original spectrum data produced by mass spectrometry contains a large number of noise signals, which must be pre-processed to remove. Interference.
In view of the characteristics of large number of variables and samples in hemopeptide map, facing such complex data, only through bioinformatics method can we identify a group of peptide peaks closely related to disease and discover the characteristic information related to disease in hemopeptide map. However, the existing data management and analysis tools can not meet the needs of the disease. For this reason, we combine clinical proteomics with bioinformatics to develop a management and analysis system based on mass spectrometry serum peptide profiles, BioSunMS. This system is based on ECLIPSE plug-in architecture, using JAVA. Language development has the characteristics of easy release and secondary development, friendly interface, cross-system platform, easy to manage clinical samples, mass spectrogram and mass spectrogram pretreatment and modeling analysis, so as to facilitate the relevant researchers to carry out disease classification and typing research conveniently and quickly. Finally, we based on lung cancer patients peptide. The sample classification and typing research of the graph illustrate the function of BioSunMS as an example.
1. blood peptide map database construction
Serum peptide profiles, samples and clinical information of normal persons and patients with various tumors (including lung cancer, liver cancer, breast cancer, rectal cancer, prostate cancer, leukemia, etc.) are stored in the hemopeptidase database. The database mainly contains sample sources, diagnostic methods, sample processing, mass spectrometry detection methods, and serum peptide mass spectrometry numbers. The database mainly provides the following important functions: serum peptide map inquiry, through the system, users can obtain the marker spectrum peaks of specific tumors and corresponding peptide sequences; various diseases blood peptide map data submission, through this system, researchers can collect disease blood peptide map data in their laboratory, Submitted to this database, thus enriching the types of diseases in the database; analysis of the disease information of the blood peptide map, the detection personnel will be directly obtained by the clinical blood peptide map query through this database, thus obtaining disease-related information.
Software development of data processing and analysis of blood peptide map 2.
In order to rapidly and accurately carry out tumor classification and typing research based on hemopeptide map data, a data processing and analysis module of hemopeptide map was developed. After statistical analysis of the processed data, the characteristic peaks can be found, the blood peptide map model can be established, and the blind samples can be discriminated.
3. tumor classification and typing based on blood peptide map data
With support vector machine (SVM), principal component analysis (PCA), genetic algorithm (GA), Na? Ve Bayes, partial least squares (PLS) and other commonly used statistical and machine learning methods as tools, tumor classification and typing module based on blood peptide map data was constructed, and model parameters were provided. The optimization function is convenient for relevant personnel to carry out tumor classification and typing research.
Establishment of 4. tumor characteristic blood peptide map model
The study was carried out in collaboration with the National Center for Instrumental Analysis (NIAA). In the previous work, NIAA has completed the collection of high resolution mass spectrometry (HRMS) data from 1000 healthy people and 2000 patients with lung cancer, liver cancer, breast cancer, rectal cancer, prostate cancer and leukemia. The blood peptide maps of 254 lung cancer patients and 257 normal controls were analyzed in the database. Firstly, we constructed the training set from the blood peptide maps of 150 lung cancer patients and 150 control samples. The rest 104 lung cancer patients and 107 normal control samples were used to construct the test set. Seventy-four characteristic peaks were screened out according to the standard of P 0.005. Based on these variables, we constructed the classification model of lung cancer hemopeptide map by SVM and validated it by test set. For test set, the accuracy, sensitivity and specificity of classification were 92.3%, 96.3% and 94.3% respectively. Based on the information of spectral peaks, an early diagnosis model of lung cancer based on mass spectrometric hemopeptide map was constructed, and the early diagnosis of lung cancer was preliminarily explored.
To sum up, a software named BioSunMS, which integrates the database management and analysis of serum peptide profiles of mass spectrometry, was constructed. The system was used to analyze the data of lung cancer hemopeptide profiles, and the early diagnosis model of lung cancer hemopeptide profiles was constructed, which provided bioinformatics support for the related research based on mass spectrometry hemopeptide profiles.
【学位授予单位】:中国人民解放军军事医学科学院
【学位级别】:博士
【学位授予年份】:2009
【分类号】:R346

【相似文献】

相关期刊论文 前10条

1 李向阳;张嘉保;何永聚;王景龙;;CD4~+T细胞表位预测及其应用[J];安徽农业科学;2011年17期

2 毛向明;邢荣威;景晓玮;周其赵;余庆锋;郭文彬;武小强;褚庆军;冯春琼;;弱精子症相关基因的生物信息学研究[J];中华男科学杂志;2011年08期

3 沈霞;谭亚芳;刘清;;金银花中绿原酸及其异构体三维结构的生物信息学研究[J];陕西中医;2011年07期

4 孙红;殷作群;孙妍;丁瑜;;生物信息学在医药学领域中的应用[J];医学信息(上旬刊);2011年09期

5 丁克祥;董萍;韩晋云;杨永鹏;丁宇;丁振华;;神经肽Y及其受体的生物信息学和医学生理学的研究[J];国际老年医学杂志;2010年03期

6 付芹芹;荆春霞;杨光;郭志云;孙小会;王穗湘;李月琴;周天鸿;;微小隐孢子虫腺苷酸激酶基因克隆及分析[J];中国公共卫生;2011年07期

7 李江域;赵东升;王玉民;;GPU计算及其在生物医学研究中的应用[J];军事医学;2011年08期

8 郑辉;黄志刚;闻人庆;李洪义;;眼皮肤白化病患者酪氨酸酶基因突变的研究[J];中国应用生理学杂志;2011年03期

9 朱文楠;习杨;吕湘;刘德培;;人类miRNA上游转录因子及下游靶基因的基因本体分析[J];中国微生态学杂志;2011年08期

10 崔颖;王芳;苏建忠;刘洪波;张岩;史庆春;;医学院校生物信息学专业《数据库原理与技术》教学方法研究与实践[J];数理医药学杂志;2011年04期

相关会议论文 前10条

1 李媛;崔尚金;李建伟;于康震;;分子生态学与生物信息学[A];中国畜牧兽医学会禽病学分会第十一次学术研讨会论文集[C];2002年

2 陆文聪;钮冰;;基于数据挖掘的生物信息学研究进展[A];中国化学会第27届学术年会第15分会场摘要集[C];2010年

3 陈婷婷;郭婷婷;李林;安冬;;基于生物信息学的功能蛋白基因序列分类研究[A];2011年全国通信安全学术会议论文集[C];2011年

4 卢学春;杨波;朱宏丽;姚善谦;;采用生物信息学方法优化依硫磷酸联合方案治疗MDS的应用研究[A];中国科协海峡两岸学术研讨会——2008血液肿瘤论坛会议会编[C];2008年

5 阮林;何颖;邹泽红;傅意玲;陈惠芳;陶爱林;;外源蛋白过敏原性生物信息学评价[A];中华医学会2010年全国变态反应学术会议暨中欧变态反应高峰论坛参会指南/论文汇编[C];2010年

6 冯文龙;赵清杰;;基于遗传算法的DNA多序列比对问题[A];2007年中国智能自动化会议论文集[C];2007年

7 康晓东;;生物信息学及其研究对象[A];2003年全国医学影像技术学术会议论文汇编[C];2003年

8 王智宇;童强松;曾甫清;刘媛;顾朝辉;郑丽端;蔡嘉斌;蒋国松;;小鼠睾丸特异性基因TSEG-4的克隆及表达分析[A];第十五届全国泌尿外科学术会议论文集[C];2008年

9 朱云平;刘湘军;魏丽萍;李亦学;;肝脏蛋白质组的生物信息学研究[A];中国蛋白质组学第三届学术大会论文摘要[C];2005年

10 孙琳琳;蒋继志;;生物信息学及其在作物抗性基因研究中的应用[A];中国植物病理学会2006年学术年会论文集[C];2006年

相关重要报纸文章 前10条

1 衣晓峰 乔蕤琳;哈医大建立系列生物信息学研究方法[N];中国医药报;2010年

2 记者 郭晓静 通讯员 熊学莉;三医大建起生物信息学数据库[N];重庆日报;2003年

3 本报记者 白毅;生物信息学院士谈[N];中国医药报;2002年

4 中科院生物学部 张春霆;对生物信息学的展望[N];北京科技报;2000年

5 中科院院士 吴e,

本文编号:2184447


资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/shiyanyixue/2184447.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户01d99***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com