基于稀疏低秩回归方法的肿瘤亚型聚类分析
[Abstract]:At present, cancer is one of the major diseases leading to human death. With the development of the second generation sequencing technology, scholars from all over the world have carried out large-scale cancer genome sequencing projects (such as TCGA) and obtained a large number of different types of biological data (such as mRNA expression data and DNA methylation data). Somatic mutation data) has a positive effect on understanding the pathogenesis of cancer, searching for accurate subtypes of cancer, designing effective drugs for cancer treatment, and so on. However, with the new problems, how to fully integrate and use the multiple sets of biologic sequencing data to design a tumor subtype clustering algorithm has become one of the hot topics in bioinformatics. At present, the commonly used analysis methods of tumor subtype clustering are semi-supervised or unsupervised sample allocation for a single biometric data. However, the disadvantage of this kind of method is that many kinds of correlated data types can not be used in a single clustering method, which can easily cause information loss. In recent years, a number of clustering algorithms for tumor subtypes have been proposed based on multigroup biological data. However, these methods are still in the early stage of development, and there are still many problems to be solved. For example, gene pre-screening and real data integration model are constructed to get more accurate results. Therefore, there is an urgent need to develop new data analysis methods. In this paper, the core idea of our work is to project high dimensional multigroup data into a low dimensional subspace containing major biological processes based on sparse low rank regression. Finally, the purpose of data fusion and fast clustering is achieved. The first chapter introduces the research background and significance of subtype analysis based on multi-group data, as well as the current research situation and main research methods at home and abroad. In the second chapter, we introduce the commonly used data of cancer subtype, and enumerate and review some representative clustering algorithms that integrate many kinds of data. Chapter 3 introduces the theory of optimizing iCluster algorithm based on sparse low rank regression method. Based on the sparse low rank regression method, we replace the optimized PCA algorithm, calculate the initial value of the coefficient matrix with sparse low rank property, and ensure the estimation of the optimal posterior probability value in the subsequent iteration process. Compared with the iCluster algorithm, the comparison experiment also verifies the effectiveness of the improved algorithm. In chapter 4, the theory of cluster clustering algorithm based on sparse low rank regression is introduced. It uses a suitable sparse low-rank regression method to find valid low-dimensional subspaces from each biological data, and then integrates these subspaces into a sample-sample similarity matrix. Finally, the cancer subtypes were identified by spectral clustering. The experimental results on three different types of cancer data sets show that the proposed cluster is more effective in predicting life cycle. In GBM subtype analysis, based on the integration of expressed and methylated data, our method can more effectively capture biological features and find subsets of subtypes, and find a new hidden subtype. The fifth chapter introduces some problems in the research, summarizes the full text, and looks forward to the future development direction.
【学位授予单位】:安徽大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R730.2;O212.1
【相似文献】
相关期刊论文 前10条
1 黄良;;门限自回归方法在秋季低温发生期预测中的运用[J];四川气象;1991年03期
2 王书宁,戴建设,胡萍;未知有界误差下新的回归方法[J];控制与决策;1994年04期
3 潘蕙琦,史秉璋;介绍一种回归方法──浮动法[J];数理统计与管理;1985年03期
4 倪加勋;介绍一种新的回归方法——单调回归[J];统计与决策;1986年03期
5 颜金锐 ,林群;秩单调回归方法及应用[J];厦门大学学报(哲学社会科学版);1993年03期
6 杨自强;殷溪源;;基于垂直距离的回归方法[J];物探化探计算技术;1993年02期
7 牟永平;怎样用自回归方法 做季降水量预报[J];山东气象;1979年01期
8 孙耀东,王太源,宗序平;可线性化回归方法的改进和拓展[J];扬州大学学报(自然科学版);2001年02期
9 潘蕙琦,史秉璋;用最优回归方法评价一种选择回归子集的新方法[J];数学的实践与认识;1987年02期
10 黄树颜;回归方法的数据预处理及其应用[J];统计研究;1986年02期
相关会议论文 前2条
1 王莉;杨印生;刘子玉;;基于Binary Logistic回归方法的农村劳动力流动影响因素分析[A];中国现场统计研究会第12届学术年会论文集[C];2005年
2 周明;陈中笑;;利用二元回归方法分析我国降水的同位素效应[A];S6 大气成分与天气气候变化[C];2012年
相关博士学位论文 前1条
1 勾建伟;惩罚回归方法的研究及其在后全基因关联研究中的应用[D];南京医科大学;2014年
相关硕士学位论文 前5条
1 葛曙光;基于稀疏低秩回归方法的肿瘤亚型聚类分析[D];安徽大学;2017年
2 郭月玲;百分位数回归方法在财务管理中的应用[D];电子科技大学;2008年
3 杜万亮;基于独立成分分析的多元回归方法研究[D];东北大学;2009年
4 刘高生;切片逆回归降维模型扩展及其应用[D];贵州财经大学;2014年
5 王晓霞;基于分片逆回归的维数缩减[D];湖北大学;2011年
,本文编号:2134453
本文链接:https://www.wllwen.com/kejilunwen/yysx/2134453.html