肿瘤纯度在差异基因表达和肿瘤亚型聚类中的作用研究

发布时间:2018-02-27 05:32

  本文关键词: 差异表达基因 广义线性模型 DNA甲基化 EM算法 肿瘤纯度 癌 症 亚 型 出处:《上海师范大学》2017年博士论文 论文类型:学位论文


【摘要】:肿瘤与正常细胞的差异基因表达分析、肿瘤的亚型识别都对癌症的早期诊断和临床治疗具有非常重要的意义。然而,临床上获得的肿瘤组织往往包含一定数量的其它细胞,如正常细胞、免疫细胞、基质细胞、血管细胞等。其中,正常细胞的混入会对差异基因表达分析和肿瘤亚型分类产生不利影响。因此,建立合适的统计模型修正肿瘤纯度信息对差异基因表达分析、肿瘤聚类的影响是亟待解决的工作。本论文针对以上两个问题展开系统研究。首先,我们研究了肿瘤纯度信息对差异表达基因分析的影响。通过模拟分析发现,肿瘤纯度与基因表达量差异之间的关系是乘性而非原来认为的线性关系。忽略肿瘤纯度,或者将肿瘤纯度作为协变量加入回归模型都会使得差异表达基因分析的结果出现偏差。为了解决这个问题,我们提出了一种广义的最小二乘模型和Wald方法来检验每个基因在肿瘤和正常细胞之间的差异性。通过对TCGA肿瘤数据的分析表明,无论是在差异表达基因个数、肿瘤间统计量一致性等指标上还是在对应癌症类型功能关联性上,该方法都优于传统的t-test和limma。其次,我们研究了肿瘤纯度信息对肿瘤样本进行无监督聚类的影响。通过对TCGA乳腺癌450K甲基化芯片数据聚类结果分析发现,利用传统的k-means和NMF进行聚类,肿瘤纯度将会使得聚类结果出现偏差,具有相类似纯度的肿瘤样本极易聚在同一类,并且肿瘤纯度较低的样本极容易聚错。基于此,我们针对DNA甲基化芯片数据,提出了一个基于模型的聚类算法。我们将肿瘤样本在每一个位点的甲基化水平假设成了一个高斯混合分布,利用EM算法进行参数估计和肿瘤样本聚类。数据模拟分析表明,相比较于k-means,我们的算法具有更高的精度。通过对TCGA的23种癌症的分析发现,我们的方法得到了相对于k-means和NMF的偏差较小的聚类结果。
[Abstract]:The differential gene expression analysis between tumor and normal cells and the recognition of tumor subtypes are of great significance for the early diagnosis and clinical treatment of cancer. However, the tumor tissues obtained in clinic often contain a certain number of other cells. Such as normal cells, immune cells, stromal cells, vascular cells, etc. Among them, the mixing of normal cells will have a negative effect on differential gene expression analysis and tumor subtype classification. It is urgent to establish a suitable statistical model to modify the tumor purity information for differential gene expression analysis, and the effect of tumor clustering is urgently needed to be solved. In this paper, the above two problems are systematically studied. First of all, We studied the effect of tumor purity information on differential expression gene analysis. Simulation analysis showed that the relationship between tumor purity and gene expression difference was multiplicative rather than linear. Or adding tumor purity as a covariable to the regression model can skew the results of differential expression gene analysis. We propose a generalized least-squares model and Wald method to test the difference between each gene in tumor and normal cells. The analysis of TCGA tumor data shows that, regardless of the number of differentially expressed genes, This method is superior to the traditional t-test and limma.Secondly, this method is superior to the traditional t-test and limma.These methods are better than the traditional t-test and limma.Secondly, We studied the effect of tumor purity information on the unsupervised clustering of tumor samples. By clustering the data of 450K methylation chip for TCGA breast cancer, we found that traditional k-means and NMF were used for clustering. The purity of tumor will cause the clustering results to deviate, the samples with similar purity are easily clustered in the same class, and the samples with lower tumor purity are easy to get wrong. Based on this, we aim at the DNA methylation chip data. A model-based clustering algorithm is proposed. The methylation level of tumor samples at each site is assumed to be a mixed distribution of Gao Si, and the EM algorithm is used to estimate the parameters and cluster the tumor samples. Compared with k-meanss, our algorithm has higher accuracy. Through the analysis of 23 kinds of cancer of TCGA, we find that our method has less deviation than k-means and NMF clustering results.
【学位授予单位】:上海师范大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:R73-3;TP311.13

【相似文献】

相关会议论文 前3条

1 姜红;饶丹;郭和平;王敏;V.Adams;葛均波;;细胞因子刺激鼠骨骼肌成肌细胞的差异基因表达[A];中华医学会心血管病分会第八次全国心血管病学术会议汇编[C];2004年

2 彭振;何守朴;龚文芳;孙君灵;潘兆娥;许菲菲;杜雄明;;不同基因型棉花苗期叶片受盐胁迫诱导的差异基因表达与转录调控分析[A];中国棉花学会2014年年会论文汇编[C];2014年

3 曾桂英;崔大祥;田芙蓉;王枫;任东青;赵涛;阎小君;苏成芝;;小肠RNA对小鼠小肠辐射损伤的恢复及其差异基因表达[A];西部大开发 科教先行与可持续发展——中国科协2000年学术年会文集[C];2000年

相关博士学位论文 前3条

1 张伟伟;肿瘤纯度在差异基因表达和肿瘤亚型聚类中的作用研究[D];上海师范大学;2017年

2 王瑶;基于样本子集差异基因表达检测的统计方法研究[D];吉林大学;2011年

3 纪兆华;基于样本子集差异基因表达检测的统计方法研究[D];吉林大学;2010年



本文编号:1541398

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1541398.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户7d005***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com