蛋白质组表达谱基本生物信息学研究及全蛋白质组等电点分布研究

发布时间：2019-03-30 21:21

【摘要】：人类基因组研究为我们提供了人类基因组图谱,但基因组图谱所能提供的基因组内蕴含的功能信息非常有限。为了系统阐释其编码基因的功能,以及其间的相互关系等,蛋白质组研究同益受到关注。蛋白质组研究最根本的目标是建立生物体组织或器官的蛋白质组表达谱,以及随后系统地阐明表达谱的生物学意义。至今,蛋白质组的多种鉴定仪器已经能以较高的通量进行蛋白质鉴定,这些技术为大规模的蛋白质组表达谱研究奠定了基础。然而,与高速发展的仪器相比较,适合于大规模蛋白质组表达谱研究的数据处理、整合及分析方法明显滞后。虽然个别表达谱研究也建立了部分数据处理、整合的方法,但至今仍然没有一个系统、全面的数据处理、整合体系。蛋白质鉴定的可靠性依然是蛋白质质谱鉴定的难点;而且,在质谱产出数据和最终表达谱的系统生物学意义分析之间仍然存在着难以跨越的鸿沟。为进一步提高蛋白质鉴定的可靠性,并填补蛋白质质谱鉴定和表达谱分析之间的鸿沟,为最终鉴定蛋白质的生物学分析提供便利,我们在对现有研究的充分调研以及对蛋白质组表达谱需求详细分析的基础上,建立了多种策略以加强鉴定结果的可靠性,并为随后的生物学分析提供了鉴定蛋白质的基本注释信息。对于基因组序列已知的生物,数据库搜索策略是最经济、最有效的蛋白质鉴定疗法。为获得尽可能多高可靠性的鉴定结果,我们采用了分步搜索的策略:首先通过对一个质量相对较高、覆盖率较大的非冗余数据库的搜索完成基本鉴定;其次,为了充分利用质谱数据,我们建立了用其它数据库(包括蛋白质数据库和核酸数据库)进行分步搜索的策略,完成了质谱数据的补充鉴定和新蛋白质挖掘。出于数据库搜索属于一种模式匹配的策略,所用的质谱鉴定结果往往不是特别精确,而且数据库中也存在大量的相似蛋白质或肽段,有时一个质谱鉴定结果可能匹配不止一个蛋白质或肽段。为了充分利用这些质谱数据,并准确地描述肽段和蛋白质鉴定的不精确性,我们建立了肽段和蛋白质鉴定的Group模型。对于肽质最指纹谱的结果,针对其数据的特殊性,首先利用统计方法获得样
[Abstract]:The study of human genome provides us with the map of human genome, but the functional information contained in genome map is very limited. In order to systematically explain the function of the coding genes and the relationship between them, proteome research has attracted more and more attention. The most fundamental goal of proteome research is to establish proteome expression profiles of tissues or organs of organisms, and then systematically clarify the biological significance of the expression profiles. Up to now, a variety of proteome identification instruments have been able to identify proteins with high throughput, and these techniques have laid the foundation for large-scale proteome expression profile research. However, compared with the rapid development of the instrument, the data processing, integration and analysis methods suitable for large-scale proteome expression profiling are obviously lagging behind. Although some methods of data processing and integration have been established in the study of individual expression profiles, there is still no systematic and comprehensive data processing and integration system. The reliability of protein identification is still a difficulty in the identification of proteins by mass spectrometry, and there is still a gap between the mass spectrometry data and the system biological significance analysis of the final expression profile. In order to further improve the reliability of protein identification, and to fill the gap between protein mass spectrometry and expression spectrum analysis, and to facilitate the final identification of protein biological analysis, Based on the thorough investigation of the existing research and the detailed analysis of the proteome expression profile, we set up a variety of strategies to enhance the reliability of the identification results. The basic annotation information for the identification of proteins was provided for the subsequent biological analysis. Database search strategy is the most economical and effective protein identification therapy for organisms with known genome sequences. In order to obtain as many high reliability identification results as possible, we adopt a step-by-step search strategy: firstly, we complete the basic identification by searching a relatively high quality, high coverage non-redundant database; Secondly, in order to make full use of mass spectrometry data, we set up a step-by-step search strategy using other databases, including protein database and nucleic acid database, and completed the supplementary identification of mass spectrometry data and new protein mining. Since database search is a pattern-matching strategy, the mass spectrum identification results used are often not particularly accurate, and there are a large number of similar protein or peptide segments in the database. Sometimes a mass spectrometry result may match more than one protein or peptide. In order to make full use of these mass spectra data and accurately describe the imprecision of identification of peptide segments and proteins, we established a Group model for identification of peptides and proteins. For the results of peptide most fingerprinting, according to the particularity of its data, the sample was obtained by statistical method.
【学位授予单位】：中国人民解放军军事医学科学院
【学位级别】：博士
【学位授予年份】：2005
【分类号】：Q51

【引证文献】