基于二代测序数据的癌症驱动通路识别方法
发布时间:2018-04-25 13:10
本文选题:第二代测序技术 + 癌症 ; 参考:《曲阜师范大学》2016年硕士论文
【摘要】:随着高通量测序技术的发展,研究人员已经能够在全基因组范围内解决生物学以及生物医学中各种各样的问题,在此过程中也得到了海量的生物学数据。这些技术包括微阵列技术(例如基因表达,拷贝数变异,全基因组关联研究以及甲基化测序技术),第二代测序技术(例如RNA-seq,全外显子组测序以及全基因组测序技术)以及ChIP-Seq等技术。分析由这些技术所产生的数据常常能够发现一些值得注意的基因,这对于后续的生物学解释和验证具有很深远的意义。癌症通常是由基因突变的积累而引发的。最近,第二代测序技术的发展产生了大量的癌症基因组数据,这些数据帮助科研人员研究出识别癌症发展过程中的一些重要基因突变的算法,然而,这些算法不能解决基因畸变的异质性问题。因此,众多学者从研究癌症驱动基因转而研究导致癌症的驱动通路。为了识别癌症驱动通路,必须发展出相应的生物信息学算法。在本论文中,基于第二代测序数据,重点围绕着识别癌症驱动通路的算法进行研究,提出了有效的驱动通路识别算法,并且将算法的关键流程进行了详细阐述,同时与传统算法的结果进行了比较。本文的研究工作总结如下:第一,提出了一种改进算法来解决 最大权重子矩阵‖问题,该问题是基于癌症驱动通路的两种性质——覆盖性和排斥性——来识别驱动突变通路。这种最优化启发式改进算法称为模拟退火遗传算法(SAGA)。特别的,将基因表达数据融合到该算法中,使该算法运行结果更符合生物学意义,并且取得了令人满意的结果。第二,基于基因之间相互作用网络,基因变异将会通过改变或者移除某个点或者改变点的连接情况引起相互作用网络结构的变化,从而改变网络中基因表达的生物化学性质,导致癌症发生。根据此生物学现象,提出了DriverFinder算法,将正常样本和癌症样本的基因表达数据联合分析识别基因表达的离群值,同时,基因过长而引起的随机突变可以基于拟合广义加性模型进行滤除。通过使用DriverFinder算法,识别出具有生物学意义的癌症驱动突变基因,将这些基因进行生物学通路富集分析,从而识别出癌症驱动通路。通过大量的实验比较结果证明了该算法是有效的。本文最后分析了当前识别癌症驱动通路研究中存在的一些问题和今后的研究中需要做的工作。
[Abstract]:With the development of high-throughput sequencing technology, researchers have been able to solve a variety of biological and biomedical problems in the whole genome, in the process of obtaining a large amount of biological data. These include microarray techniques (such as gene expression, copy number variation, Genome-wide association studies and methylation sequencing techniques, second generation sequencing techniques (e.g. RNA-seq, total exon sequencing and whole genome sequencing) and ChIP-Seq, etc. Analysis of the data generated by these techniques often leads to the discovery of some noteworthy genes, which are of great significance for subsequent biological interpretation and verification. Cancer is usually caused by the accumulation of mutations. Recently, the development of second-generation sequencing technology has produced a large amount of cancer genome data, which has helped researchers to develop algorithms to identify some of the important gene mutations in the development of cancer, however, These algorithms can not solve the heterogeneity problem of gene aberration. As a result, many researchers have shifted from studying cancer-driven genes to studying the driving pathways that lead to cancer. In order to identify the cancer driving pathway, the corresponding bioinformatics algorithm must be developed. In this paper, based on the second generation of sequencing data, we focus on the identification of cancer driving pathway algorithm, put forward an effective driving path recognition algorithm, and the key process of the algorithm is described in detail. At the same time, the results are compared with the traditional algorithm. The research work in this paper is summarized as follows: first, an improved algorithm is proposed to solve the problem of maximum weight submatrix. This problem is based on the two properties of cancer drive pathway-coverage and rejection-to identify the driving mutant pathway. This optimization heuristic improved algorithm is called simulated annealing genetic algorithm (SA). In particular, the genetic expression data is fused into the algorithm, which makes the results of the algorithm more suitable for biological significance, and the satisfactory results are obtained. Second, based on the network of interactions between genes, genetic variation will alter the biochemical properties of the gene expression in the network by changing or removing the connection of a point or changing the connection of the point of interaction. Leading to cancer. According to this biological phenomenon, the DriverFinder algorithm is proposed, which combines the gene expression data of normal and cancer samples to analyze the outliers of gene expression, and at the same time, Random mutations caused by gene length can be filtered based on fitted generalized additive model. By using the DriverFinder algorithm, we can identify the cancer-driven mutant genes with biological significance, and then analyze these genes by the enrichment analysis of the biological pathway, and then identify the cancer-driven pathway. A large number of experimental results show that the algorithm is effective. At the end of this paper, some problems existing in the research of identifying the driving pathway of cancer and the work to be done in the future are analyzed.
【学位授予单位】:曲阜师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:R73-3;Q811.4
【相似文献】
相关期刊论文 前10条
1 于聘飞;王英;葛芹玉;;高通量DNA测序技术及其应用进展[J];南京晓庄学院学报;2010年03期
2 解增言;林俊华;谭军;舒坤贤;;DNA测序技术的发展历史与最新进展[J];生物技术通报;2010年08期
3 傅俊英;赵蕴华;;DNA测序技术领域的相关政府投入分析[J];现代生物医学进展;2012年05期
4 刘振波;;DNA测序技术比较[J];生物学通报;2012年07期
5 刘朋虎;林冬梅;林占q,
本文编号:1801463
本文链接:https://www.wllwen.com/yixuelunwen/zlx/1801463.html