当前位置:主页 > 医学论文 > 西医药论文 >

利用生物学网络研究疾病的分子机制和预后

发布时间:2018-01-06 12:37

  本文关键词:利用生物学网络研究疾病的分子机制和预后 出处:《中国人民解放军军事医学科学院》2011年博士论文 论文类型:学位论文


  更多相关文章: 疾病相关基因 基因转录调控网络 生物学通路 蛋白质相互作用网络 疾病预后


【摘要】:疾病时刻威胁着人类的健康和生活,严重情况下甚至可以导致死亡。由于技术水平的限制,多年以来,小规模、低通量的疾病遗传学研究往往停留在实验室水平上,对疾病的临床控制作用有限。近年来,随着基因组学和蛋白质组学的发展,众多研究机构利用组学实验手段产生了海量的与疾病有关的实验数据,用于研究疾病的发生、发展过程以及寻找可能的治疗方法。通过生物信息学手段对海量数据的大规模挖掘,人们发现了大量基因、蛋白质以及生物学通路与疾病表型相关联,这些研究成果为疾病机制的研究和可能的临床诊断奠定了基础。 尽管疾病的组学研究已经取得了长足发展,但是仍然存在一些亟待解决的问题:首先,某些研究工作往往围绕着特定疾病的病理过程进行设计,很难推广到其他疾病研究中去;其次,一些基于大规模基因芯片表达数据的研究,鉴定出了大量差异表达的基因或者蛋白质,但对于如何解读这些结果,尤其是如何揭示这些基因或者蛋白质联合作用对疾病发生发展的影响却不甚明了;再次,很多研究往往使用单一生物学实验数据,分析结果的假阳性高;另外,缺少通用的跨实验平台的组学数据综合分析软件。 为解决以上问题,本文从多种组学数据和先验生物学知识出发,设计和建立了一系列数学模型和分析策略,有效地鉴定出了一系列与疾病相关的通路和蛋白质相互作用子网,并用鉴定出的蛋白质子网成功地预测了乳腺癌的预后。本文建立的研究方法和体系可应用于不同疾病的机制和诊断研究,具体内容如下: 首先,为了预测疾病条件下基因间的调控关系,本文基于基因芯片数据,发展了一种整合主成分分析、皮尔森相关系数和支持向量机分类器的预测策略。目前现有的很多调控关系预测方法,都是直接基于原始数据进行分析预测的,这些方法忽略了芯片数据的噪声影响,以及基因之间的相互作用关系。利用数据降维算法能抽取基因芯片数据的关键信息,降低噪声影响;而结合基因表达相关性参数——皮尔森相关系数(PCC),能够同时考虑基因间的相互关系。我们利用数据降维算法——主成分分析法(PCA)抽取基因表达特征,进而利用这些特征和基因表达水平之间的皮尔森相关系数建立了用于衡量基因间调控关系的新参数FAB,并将其输入到支持向量机分类器(SVM)里面,预测基因间的调控关系。预测结果显示,选择合适的数据降维算法和合适的特征向量定义的调控参数,能以较高的准确度、特异度和灵敏度预测基因间的调控关系,这项工作为研究疾病条件下基因间的调控关系奠定了基础。 其次,为了研究疾病的发生发展机制,本文提出了一套疾病相关通路和重要基因的鉴定策略,并成功将其应用到了II型糖尿病患者和吸烟影响的人群数据集上。通过整合疾病基因芯片表达数据集和已有生物学通路数据库(KEGG通路数据库和BioCarta通路数据库),首次引入非负矩阵分解分析策略(NMFAS)分析疾病人群和正常人群的通路活性水平的差异表达情况,并解决了该算法解不唯一性问题,鉴定出了疾病人群机体内活性显著差异的生物学通路,并通过分析通路成员基因对通路活性的贡献值,鉴定出与疾病表型潜在相关的重要基因,从而为研究疾病的发生发展过程提供了重要线索。 最后,本文给出了一种基于蛋白质相互作用网络的疾病诊断和预后预测策略。以乳腺癌转移数据为研究对象,从已知的乳腺癌相关基因出发,利用随机行走算法(Random Walk)在人类蛋白质相互作用网络中寻找潜在的乳腺癌相关子网,并进而基于这些子网的基因累积表达信息,利用支持向量机(SVM)分类器预测乳腺癌的转移。通过对标准数据集的分析,该算法能够有效的找到疾病表型相关基因和乳腺癌相关蛋白质相互作用子网,并且在预测乳腺癌转移时,该策略在预测正确率、敏感度和特异度方面取得了理想的结果。 总之,本文从基因、基因调控关系、蛋白质相互作用子网和生物学通路等多个方面,系统研究了与人类疾病的发生发展相关的分子和相互作用。通过综合考察已有生物学知识、基因表达、基因调控、生物学通路和蛋白质相互作用信息,利用数据降维算法、机器学习分类算法、网络传播算法等多种数据挖掘方法鉴定与疾病潜在相关的基因、蛋白质或者蛋白质子网,并利用鉴定得到的疾病相关蛋白质子网成功进行了疾病诊断。本文提出的这一系列方法可以在一定程度上避免了现有方法的不足和限制,提高了现有方法的预测准确度和灵敏度,从而促进了对疾病条件下的生物分子、网络乃至整个生物系统的理解。另外,本文方法均可从一种疾病推广到其他疾病,具有良好的扩展性。 本文的主要创新点包括:利用数据降维算法抽取基因芯片的表达特征结合基因共表达强度参数预测基因间的调控关系,提高了预测准确率和敏感度特异度;首次引入非负矩阵分析策略鉴定疾病相关生物学通路和重要基因;利用网络传播算法分析疾病相关蛋白质相互作用,并用于预测疾病预后,在预测敏感度、特异度方面有较大提高。几部分研究内容互相支撑,互为补充,并且具有较强的通用性和可扩展性,可以应用于不同疾病的机制研究和诊断,将会为疾病标志物和药物靶标的发现以及疾病的临床诊断提供重要参考和帮助。
[Abstract]:The disease threatening human health and life, even in severe cases can lead to death. Due to technical limitations, over the years, small scale, low flux disease genetics studies tend to stay in the laboratory level, clinical control of disease. In recent years, with the development of genomics and proteomics many research institutions, using proteomics experiments produce vast amounts of experimental data and related diseases, to study the incidence of the disease, the development process and look for possible treatment. Through bioinformatics means large-scale mining of massive data, they found a large number of genes, proteins and biological pathways associated with disease phenotypes, lay based on these research results on mechanisms of disease and possible clinical diagnosis.
Although the research has achieved great development in disease group, but there are still some problems to be solved: firstly, some research work often revolves around the pathological process of disease specific design, it is difficult to generalize to other disease research; secondly, based on some large scale microarray gene expression data of identified genes or a large number of differentially expressed proteins, but how to interpret these results, especially how to reveal these genes or proteins combined with the effect of the development of disease is unclear; again, a lot of research often use a single biological experimental data, high false positive results; in addition, the lack of cross platform universal group the integrated data analysis software.
To solve the above problems, this paper studies data and prior biological knowledge from a variety of groups, design and set up a series of mathematical models and analysis strategy, effectively identified a series of disease-related pathways and protein interaction networks, and identified the protein network successfully to predict the prognosis of breast cancer research method and system established in this paper can be applied to the mechanism and diagnosis of different diseases, the specific contents are as follows:
First of all, in order to control the relationship between the prediction of disease conditions between genes, the microarray data based on the analysis of the development of an integrated principal component prediction strategy Pearson correlation coefficient and the support vector machine classifier. At present, many of the existing regulatory relationship prediction method is directly based on the original data analysis and forecast, these methods ignore noise in microarray data, relationship and interaction between genes. Using the data dimensionality reduction algorithm can extract the key information of gene chip data, reduce the effects of noise; and the combination of gene expression correlation parameters, Pearson correlation coefficient (PCC), can also consider the relationships between genes. We use data dimensionality reduction algorithm principal component analysis (PCA) feature extraction of gene expression, Pearson correlation coefficient and use these features and gene expression level between the established The new FAB parameters to measure the regulatory relationships between genes, and the input to the support vector machine classifier (SVM), the relationship between the regulation of gene prediction. The prediction results show that, the regulation of choosing appropriate parameters of the data dimensionality reduction algorithm and a suitable feature vector is defined, with high accuracy, specific regulation prediction and sensitivity between genes, this work has laid the foundation for the study of disease control conditions between genes.
Secondly, in order to study the mechanism of the occurrence and development of diseases, proposed a set of strategies for identification of disease related pathways and important genes, and successfully applied to the population data of patients with type II diabetes and smoking effects set. Expression data sets and the existing biological pathways through the integration of disease gene chip database (KEGG database and BioCarta pathway for the first time, access database) the introduction of non negative matrix factorization analysis strategy (NMFAS) differential expression analysis of disease population and normal population level pathway activity, and solves the problem of the algorithm is not the only solution, to identify biological pathways in the body disease activity was significantly different, and through the analysis of genes on the pathway pathway. The activity value of the contribution, identified with the disease phenotype potentially important genes that are related to occurrence and development of disease provides an important clue.
Finally, this paper presents a kind of disease diagnosis and prognosis prediction method for protein-protein interaction network. The metastasis of breast cancer data as the research object, starting from breast cancer related genes known, using random walk algorithm (Random Walk) in search of potential breast cancer associated subnet in the human protein interaction network, and then these sub network information based on gene expression of accumulation, using support vector machine (SVM) classifier to predict the metastasis of breast cancer. Through the analysis of the standard data set, the algorithm can effectively find the disease phenotype related genes and breast cancer related protein interaction sub networks, and in the prediction of breast cancer metastasis, the rate of correct strategy in the prediction, sensitivity and specificity have achieved satisfactory results.
In short, this article from the gene, gene regulation, many aspects of protein interaction sub networks and biological pathways, system research related to the development and occurrence of human disease and molecular interactions. The gene expression through a comprehensive review of existing biological knowledge, gene regulation, biological pathways and protein-protein interaction information, using data reduction the dimension of machine learning algorithm, classification algorithm, network communication algorithm and other data mining methods and identification of potential disease associated genes, proteins or protein protons, and the success of the disease diagnosis by identified disease associated protein network proton. This series of the method in this paper can to some extent avoid the shortcomings of existing methods and the limit of existing methods to improve the prediction accuracy and sensitivity, so as to promote the conditions of biological molecules, and even the whole network In addition, this method can be extended from one disease to other diseases with good expansibility.
The main innovations of this paper include: the expression characteristics of the use of data dimensionality reduction algorithm combined with gene chip gene expression intensity parameters to predict the relationship between gene regulation, to improve the prediction accuracy and sensitivity and specificity; first introduced non negative matrix analysis method to identify disease associated biological pathways and genes; using the network communication algorithm analysis disease related protein interactions, and to predict the prognosis of the disease, in predicting the sensitivity and specificity are greatly improved. Some research support each other, complement each other, and has strong versatility and scalability, can be applied to the study of mechanism and diagnosis of different diseases, will provide important reference and help the clinical the diagnosis and drug target discovery and disease as markers of the disease.

【学位授予单位】:中国人民解放军军事医学科学院
【学位级别】:博士
【学位授予年份】:2011
【分类号】:R341

【参考文献】

相关期刊论文 前2条

1 张云艳,李雪,隋丽华,王琦,李璞,傅松滨;卵巢癌中TGF-β/Smads信号通路的功能研究[J];遗传学报;2004年08期

2 崔建军;田庚善;田地;曾争;;干扰素信号传导通路与其基因组多态性网络模型的建立[J];遗传;2008年06期



本文编号:1387923

资料下载
论文发表

本文链接:https://www.wllwen.com/xiyixuelunwen/1387923.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户733b8***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com