基于人类表型本体的基因和疾病关联关系分析方法研究
发布时间:2018-01-23 12:41
本文关键词: 基因预测 表型 人类表型本体 表型网络 出处:《哈尔滨工业大学》2015年硕士论文 论文类型:学位论文
【摘要】:下一代基因测序技术加快了基因数据产生的速度,如此大的数据量,使得如今的难题从测序转向如何有效的利用已经产生的基因数据。表型是指受基因、环境等影响而在生物体上表现出来的特征,因而表型和基因以及疾病等有着密切的联系。现如今表型学已经是基因和疾病之间的重要纽带,而对表型的研究已是预测和发现致病基因的重要手段。目前在基因、表型和疾病之间的关联方面的研究方法主要有三种。一个是利用现有的生物医学文献,用文本发掘等相关技术发掘生物医学实体之间的关系。另外一种是利用已有的关系建立起表型、疾病、蛋白质、基因等实体间的网络,在网络中发现新的关系。最后一种是利用本体,如基因本体、表型本体等结构化的知识系统来计算实体间的相似关系。它们之间各有优缺点。研究基于本体的表型相似度计算方法能够帮助预测病人的致病基因和疾病,充分利用本体的价值。本文主要利用人类表型本体(Human Phenotype Ontology,HPO)作为工具来研究基因和表型之间以及疾病和表型之间的相似性关系,进而预测病人的致病基因和疾病。本文在基于人类表型本体中表型信息量的基础上结合本体的有向无环图结构,提出一种基于人类表型本体中通路的相似性计算方法。经验证,该方法在预测致病基因和疾病时,在不同的数据集(理想、含噪声、含不准确以及含噪声和不准确)上的效果均优于其他基于本体的主流方法。例如本文方法在预测致病基因的含噪声和不准确数据集上比第二好的Resnik方法提高了17.3个百分点,在预测疾病的含噪声和不准确数据集上比此方法提高了18.1个百分点。有研究发现在疾病和基因网络中,同类的疾病、功能相关的基因在网络中表现出聚集特征。病人的身上体现出来的表型特征中不可避免的出现一些和该疾病或者致病基因无关的表型(噪声表型),利用这种聚集特性可以筛选出一个表型集中噪声表型,可以提高致病基因和疾病预测的准确率。本文构建了表型网络,在表型网络中利用Page Rank算法寻找中心表型和周边表型,从而挖掘表型集中的噪声表型,达到表型去噪的目的。经实验模拟,该方法能很好的发现噪声表型(平均逆序数为0.136),去噪能够提高预测致病基因和疾病的准确率。
[Abstract]:The next generation of gene sequencing technology has accelerated the generation of gene data, such a large amount of data, so that today's problem from sequencing to how to effectively use the generated gene data. Phenotypic refers to the recipient gene. The phenotypes are closely related to genes and diseases. Phenotypology is now an important link between genes and diseases. Phenotypic research has been an important means to predict and find pathogenic genes. At present, there are three main research methods in gene, phenotype and disease. One is to use the existing biomedical literature. Using text mining and other related techniques to explore the relationship between biomedical entities. The other is the use of existing relationships to establish phenotypic, disease, protein, gene and other entities of the network. New relationships are found in the network. The last is the use of ontology, such as gene ontology. Phenotypic ontology and other structured knowledge systems to calculate the similarity between entities. Each of them has its own advantages and disadvantages. The study of ontology-based phenotypic similarity calculation method can help to predict the pathogenic genes and diseases of patients. Make full use of the value of ontology. This paper mainly uses human phenotypic ontology human Phenotype Ontology. HPOs are used as tools to study the similarity between genes and phenotypes and between diseases and phenotypes. Based on the phenotypic information in human phenotypic ontology, this paper combines the structure of directed acyclic graph with ontology. A method for calculating the similarity of pathways based on human phenotypic ontology is proposed. It has been proved that the method can be used in different data sets (ideal, noise-containing) in predicting pathogenic genes and diseases. Contains inaccuracies and noise and inaccuracies). For example, the proposed method is 17.3% higher than the second best Resnik method in predicting noise-containing and inaccurate data sets of pathogenic genes. The noise-and inaccurate data set for disease prediction is 18.1 percentage points higher than this method. Studies have found that similar diseases are found in disease and genetic networks. Function-related genes exhibit aggregation characteristics in the network. Some phenotypes (noise phenotypes) that are unrelated to the disease or pathogenic gene are inevitably found in the patient's phenotypic features. A phenotypic concentrated noise phenotype can be selected by using this clustering characteristic, which can improve the accuracy of the prediction of pathogenic genes and diseases. In this paper, a phenotypic network is constructed. Page Rank algorithm is used to find the central phenotype and peripheral phenotype in phenotypic network, so as to mine the noise-phenotype of phenotypic concentration and achieve the purpose of phenotypic denoising. This method can detect the noise-phenotype (the average inverse number is 0.136), which can improve the accuracy of predicting pathogenic genes and diseases.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:R440;TP391.1
【参考文献】
相关期刊论文 前1条
1 王秀艳;崔雷;;应用关键动词抽取生物医学实体间语义关系研究综述[J];现代图书情报技术;2011年09期
,本文编号:1457562
本文链接:https://www.wllwen.com/huliyixuelunwen/1457562.html
最近更新
教材专著