生物网络分析及其在复杂疾病研究中的应用
发布时间:2018-04-04 19:43
本文选题:系统生物学 切入点:生物网络 出处:《中南大学》2012年博士论文
【摘要】:如何诊断和治疗以癌症为代表的复杂疾病一直是生物医学研究的重点和难点。但这方面的研究长期以来受限于生物实验技术和实验结果分析技术,没能取得重大的突破。高通量生物技术的快速发展为复杂疾病的研究提供了海量的数据来源,尤其是以基因调控网络和蛋白质相互作用网络为代表的生物网络很好的表示了生物大分子间的复杂关系,为复杂疾病的研究提供了很好的数据支持。正是由于这类生物网络数据的大量积累,研究人员迫切的需要新的分析技术对生物网络进行分析,并最终对复杂疾病的研究、诊断和治疗提供支持。 本文从评估生物大分子间相互作用数据的可靠性出发,对图聚类、多数据融合的动态网络构建等技术进行了研究,最终将这些分析技术应用到复杂疾病的疾病基因和生物过程的识别中。主要的研究工作包括: 针对目前高通量实验技术所产生的生物网络存在假阳性高和假阴性高的问题,利用Gene Ontology注释信息和语义相似性对现有的蛋白质相互作用数据的可靠性进行评估,通过统计分析和机器学习寻找最适合于评估蛋白质相互作用可靠性的语义相似性定义。 现在直接从公开数据库中得到的生物网络都是静态的,但这显然没有反应出生物的动态性。我们通过对时序基因表达数据和组织特异性基因表达数据进行分析,并将其与现有的静态生物网络融合,构建出了具有一定时空动态特性的生物网络,并对这种动态网络进行了基本的分析,并将其跟静态网络做了比较。 现有的大部分用于从生物网络中挖掘功能模块和复合物的算法都只是基于生物网络的拓扑结构。通过分析发现,关键蛋白质在功能模块和复合物中的分布式不均匀的,而且功能模块和复合物都存在核结构,因此在聚类过程中有必要对关键蛋白和非关键蛋白做不同的处理。据此,我们提出了基于关键蛋白质的图聚类算法,EPOF。将该算法应用到酵母的蛋白质相互作用网络上,通过GO富集分析和跟已知的复合物进行比较,EPOF算法的性能比其他同类算法有显著提高。 最后,在对生物网络进行各种分析的基础之上,我们利用图聚类算法对疾病和药物对照研究中的基因表达数据进行分析,并用GO语义相似性对聚类结果进行比较,识别出跟疾病相关的生物过程。同时,我们还利用疾病的Gene Signature和生物网络数据融合不同的Gene Signature,并识别出跟疾病有密切关系的基因。 本文从生物网络数据的预处理开始,研究了生物网络的各种分析方法,最终将这些方法应用到复杂疾病的研究中,取得了较好的结果。本文的研究内容和成果,为从系统的角度对各种复杂疾病展开研究提供了支持,有助于推动我们对以癌症为代表的复杂疾病的诊断和治疗等方面的研究。
[Abstract]:How to diagnose and treat complex diseases represented by cancer has always been the focus and difficulty of biomedical research.However, the research in this field has been limited by biological experimental technology and experimental results analysis technology for a long time, and failed to make a major breakthrough.The rapid development of high-throughput biotechnology provides massive data sources for the study of complex diseases, especially the biological networks represented by gene regulation networks and protein interaction networks, which represent the complex relationships among biomolecules.It provides a good data support for the study of complex diseases.Because of this kind of biological network data accumulation, researchers urgently need new analysis technology to analyze biological network, and finally provide support for the research, diagnosis and treatment of complex diseases.In order to evaluate the reliability of biomolecular interaction data, the techniques of graph clustering, dynamic network construction of multi-data fusion and so on are studied in this paper.Finally, these analytical techniques are applied to the identification of disease genes and biological processes of complex diseases.Major research efforts include:In view of the problem of false positive and false negative high in biological networks produced by high-throughput experimental technology, the reliability of existing protein-protein interaction data is evaluated by using Gene Ontology annotation information and semantic similarity.Through statistical analysis and machine learning to find the most suitable for evaluating the reliability of protein interaction semantic similarity definition.Biological networks obtained directly from public databases are now static, but this obviously does not reflect the dynamic nature of organisms.Based on the analysis of temporal gene expression data and tissue specific gene expression data, and fusion with the existing static biological networks, we have constructed a biological network with a certain temporal and spatial dynamic characteristics.The dynamic network is analyzed and compared with the static network.Most of the existing algorithms for mining functional modules and complexes from biological networks are based on the topology of biological networks.It is found that the key proteins are distributed inhomogeneously in functional modules and complexes, and the nuclear structures exist in both functional modules and complexes. Therefore, it is necessary to treat the key proteins and non-key proteins differently in the process of clustering.Based on this, we propose a graph clustering algorithm based on key proteins (EPOF).This algorithm is applied to yeast protein interaction network. The performance of EPOF algorithm is significantly improved by go enrichment analysis and comparison with known complexes.Finally, based on the analysis of biological networks, we analyze the gene expression data in disease and drug control studies using map clustering algorithm, and compare the clustering results with go semantic similarity.Identify biological processes associated with disease.At the same time, we use the disease Gene Signature and biological network data to fuse different Gene signature and identify genes closely related to the disease.Starting with the pretreatment of biological network data, various analytical methods of biological network are studied in this paper. Finally, these methods are applied to the study of complex diseases, and good results are obtained.The research contents and results of this paper provide support for the systematic study of various complex diseases, and help to promote our research on the diagnosis and treatment of complex diseases represented by cancer.
【学位授予单位】:中南大学
【学位级别】:博士
【学位授予年份】:2012
【分类号】:R319;O157.5
【参考文献】
相关期刊论文 前1条
1 MOTULSKY Arno G.;;Genetics of complex diseases[J];Journal of Zhejiang University Science;2006年02期
,本文编号:1711400
本文链接:https://www.wllwen.com/yixuelunwen/swyx/1711400.html