基于RNA-Seq数据的差异表达基因检测算法研究

发布时间:2018-12-08 12:31
【摘要】:RNA-Seq(Ribonucleic Acid Sequencing)技术是现代生物信息学研究的常规实验手段,主要目的是筛选出测序数据中具有差异表达的基因,即检测出不同样本下表达量不相同的基因。差异表达分析是研究生物个体在不同发育阶段或不同生理环境下同一类基因的差异表达,不仅具有统计学意义而且具有生物学意义,为认识和理解生命活动过程本质以及研究基因表达调控提供重要理论基础。本文对检测RNA-Seq数据中差异表达基因的处理流程进行分析研究,主要内容包括:(1)基于加权截尾均值化M值(The Trimmed Mean of M-values,TMM)标准化和几何平均标准化,给出了基于变异系数中值绝对偏差调整的改进标准化算法。首先分别使用TMM法和几何平均法得到标准化的数据,计算每行基因在两组数据中的变异系数,比较两个变异系数得到最优变异系数,从而得到新数据,然后对新数据进行中值绝对偏差调整,实现数据的标准化。实验结果表明,本文算法不但能消除测序技术上的误差,将所有测序样本调整到同一水平,而且误差更小,精度更高。(2)基于svaseq(Surogate Variable Analysis Sequencing)算法给出了去除批次效应的改进svaseq算法。首先根据相关显著性参数,分别构建正则对数变换模型和对数变换模型,然后通过加权最小二乘法估计模型中的参数,得到数据的残差矩阵,对该矩阵进行因子分解,估计替代变量。实验结果表明,本文算法能更好的消除数据中的批次效应,而且差异表达结果也有一定的提高。(3)基于DESeq(Differential Expression Sequencing)算法给出了检测差异表达基因的改进DESeq算法。假设数据服从负二项式分布模型,首先根据改进的标准化因子估计样本的测序总数,计算模型的均值和方差并估计离散参数,然后利用精确检验进行差异表达分析。实验结果表明,本文算法能更好的检测差异表达基因,并且准度提高了 6.9%。
[Abstract]:RNA-Seq (Ribonucleic Acid Sequencing) technology is a conventional experimental method for modern bioinformatics research. The main purpose of this technique is to screen genes with different expression in sequencing data, that is, to detect genes with different expression levels in different samples. Differential expression analysis is to study the differential expression of the same kind of genes in different developmental stages or different physiological environments, which not only has statistical significance but also has biological significance. It provides an important theoretical basis for understanding and understanding the nature of life process and studying the regulation of gene expression. In this paper, the process of detecting differentially expressed genes in RNA-Seq data is analyzed. The main contents are as follows: (1) Standardization and geometric mean standardization based on weighted truncated mean M value (The Trimmed Mean of M-valuesTMM; An improved standardization algorithm based on the adjustment of mean absolute deviation of coefficient of variation is presented. First, the standardized data are obtained by using TMM method and geometric average method respectively. The coefficient of variation of each row gene in two groups of data is calculated, and the optimum coefficient of variation is obtained by comparing the two coefficients of variation, and the new data are obtained. Then the median absolute deviation is adjusted to realize the standardization of the new data. The experimental results show that the algorithm can not only eliminate the error in sequencing technology, but also adjust all the samples to the same level, and the error is even smaller. (2) based on svaseq (Surogate Variable Analysis Sequencing) algorithm, an improved svaseq algorithm is proposed to remove batch effect. Firstly, the canonical logarithmic transformation model and the logarithmic transformation model are constructed according to the relevant salience parameters, then the parameters in the model are estimated by the weighted least square method, and the residual matrix of the data is obtained, and the matrix is factorized. Estimate alternative variables. The experimental results show that the proposed algorithm can eliminate the batch effect better, and the differential expression results are improved. (3) based on DESeq (Differential Expression Sequencing) algorithm, an improved DESeq algorithm for detecting differentially expressed genes is proposed. Assuming that the data is distributed according to the negative binomial distribution model, the total number of samples is estimated according to the improved standardized factor, the mean value and variance of the model are calculated and the discrete parameters are estimated, and then the differential expression analysis is carried out by using accurate test. The experimental results show that the proposed algorithm can detect differentially expressed genes better and improve the accuracy by 6.9%.
【学位授予单位】:大连海事大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q811.4

【相似文献】

相关期刊论文 前1条

1 刘学军;李蒙;张礼;;一种针对RNA-Seq数据的基因异构体表达水平计算方法[J];中国生物医学工程学报;2013年04期

相关博士学位论文 前1条

1 曾p瑶;基于小鼠15个组织RNA-seq数据的全基因组重注释[D];中国科学院北京基因组研究所;2015年

相关硕士学位论文 前8条

1 陈\,

本文编号:2368351


资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/2368351.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户34304***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com