当前位置:主页 > 医学论文 > 肿瘤论文 >

基于转录组数据的多分辨形状聚类算法及lncRNA相关癌症分子靶标识别研究

发布时间:2018-05-06 10:39

  本文选题:生物信息学 + 时间序列基因表达谱数据 ; 参考:《吉林大学》2017年硕士论文


【摘要】:当前,胃癌的死亡率位居癌症类疾病前列,我国胃癌患者的数量约占全球胃癌患者总数的40%。科研人员近年发现一些长链非编码RNA(lncRNA)在许多癌症中表达异常,也由此提升了人们对lncRNA与癌症关系的关注兴趣。到现在为止,学界还并不熟悉胃癌的详细分子机制,对于lncRNA尤其如此。通过生物实验可以获得癌症与lncRNA的互作信息,但是与癌症相关的lncRNA的实验鉴定通常需要很高的时间复杂性和成本。本文提出了一种计算方法,通过重复使用基于外显子的胃癌阵列来确定lncRNA与胃癌之间的关系。经过实验,识别出了一种特定的lncRNA(LINC00365)及其靶标差异表达的基因,其产物被预测为可以分泌入血液、尿液或者唾液并被鉴定为胃癌的组合生物标志物的候选物。利用多源生物学知识可以进行进一步的生物信息学功能、胃癌相关的lncRNA和编码基因生物标志物的分子机制推断。本论文的首要工作是对lncRNA的分析。首先重新注释GEO数据库中人类外显子1.0st array平台下的有关肿瘤的exon array数据中的探针,进而获得lncRNA和编码基因的表达谱数据。然后对与胃癌相关的数据进行秩和检验,计算p值和表达数据基因在肿瘤与正常样本数据中的倍数变化值。当计算获得的倍数变化值大于1.5或小于1/1.5且p值小于0.01时,可认为此基因具有显著的表达差异性。接着计算皮尔森和斯皮尔曼相关系数来构建共表达网络,用逻辑函数变换整合皮尔森和斯皮尔曼计算权重,来显示显著的共表达关系。分析与差异表达的lncRNA共表达的编码基因的GO和pathway来获得lncRNA参与的生物学功能。最后判断与lncRNA相关的编码基因是否可分泌入体液,找到胃癌的组合生物标志物并进行验证。本文的第二部分是对时间序列基因表达谱数据进行研究。它是随时间不断变化的动态数据,通过分析时间序列数据可以获得较为有意义的统计特性和显著的生物学特征。最近几年,科研工作者们对时间序列数据挖掘的兴趣与日俱增,时间序列数据在每个时间点上的表达值有所不同,开发有效的分析方法是一项重大的挑战。基于时间序列的表达实验提供了探索随时间变化的基因表达谱和理解基因表达动态行为的机会,对生物学和疾病发展的研究有着至关重要的作用。本文基于多分辨率分形特征和混合聚类模型算法,探索在不同分辨率下基因表达随时间变化的模式。多分辨率分形特征由小波分解获得,这种具有概率框架的多分辨率形状混合模型算法提供了更自然、更具鲁棒性的方法进行聚类分析,其识别的分组基因具有更强的生物学意义。因此对与肿瘤相关的时间序列基因表达谱数据应用多分辨形状聚类算法,得到全局和局部的分形特征,可将数据分成有显著生物学意义的聚类。
[Abstract]:At present, the mortality rate of gastric cancer is among the leading causes of cancer diseases, and the number of gastric cancer patients in China accounts for 40 percent of the total number of gastric cancer patients in the world. In recent years, researchers have found that some long chain noncoding RNAs (LNRNAs) are abnormal in many cancers, thus increasing the interest in the relationship between lncRNA and cancer. Up to now, the detailed molecular mechanism of gastric cancer has not been known, especially for lncRNA. The interaction information between cancer and lncRNA can be obtained by biological experiments, but the identification of cancer-related lncRNA usually takes a lot of time complexity and cost. In this paper, a computational method is proposed to determine the relationship between lncRNA and gastric cancer by repeated use of an exon based gastric cancer array. A specific LNC RNA-LINC00365) and its target differentially expressed genes were identified, and their products were predicted as candidates for secreting into blood, urine or saliva and identified as combined biomarkers for gastric cancer. Further bioinformatics functions, lncRNA related to gastric cancer and molecular mechanism of coding gene biomarkers can be deduced by using multi-source biological knowledge. The main task of this thesis is to analyze lncRNA. Firstly, the probes in the exon array data of human exon 1.0st array in the GEO database were reinterpreted, and then the expression profile data of lncRNA and encoding genes were obtained. Then the data related to gastric cancer were tested by rank sum test to calculate the multiple changes of p value and expression data gene in tumor and normal sample data. When the calculated multiple change value is greater than 1.5 or less than 1 / 1.5 and p value is less than 0.01, it can be concluded that the gene has significant difference in expression. Then the Pearson and Spelman correlation coefficients are calculated to construct the coexpression network and the logical function transformation is used to integrate Pearson and Spelman to calculate the weights to show the significant coexpression relationship. Go and pathway were analyzed to obtain the biological function of lncRNA involved in the differential expression of lncRNA. Finally, whether the coding gene related to lncRNA can be secreted into body fluid, the combined biomarker of gastric cancer was found and verified. The second part of this paper is to study the time series gene expression profile data. It is a dynamic data which changes with time. By analyzing the data of time series, we can obtain more significant statistical characteristics and obvious biological characteristics. In recent years, researchers are more and more interested in time series data mining, and the expression values of time series data are different at each time point. It is a great challenge to develop effective analysis methods. The expression experiment based on time series provides an opportunity to explore the gene expression profiles and to understand the dynamic behavior of gene expression over time. It plays an important role in the study of biology and disease development. Based on multi-resolution fractal features and hybrid clustering algorithm, this paper explores the time-dependent patterns of gene expression at different resolution. The multi-resolution fractal features are obtained by wavelet decomposition. This multi-resolution shape hybrid model algorithm with probability framework provides a more natural and robust method for clustering analysis. The recognized group genes have stronger biological significance. Therefore, the multi-resolution shape clustering algorithm is applied to the time-series gene expression profile data related to tumor. The global and local fractal features can be obtained, and the data can be divided into clusters with significant biological significance.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R735.2;TP311.13

【相似文献】

相关会议论文 前3条

1 冯丹;牛勇敢;;LncRNA与脑卒中的研究分析[A];第一次全国中西医结合检验医学学术会议暨中国中西医结合学会检验医学专业委员会成立大会论文汇编[C];2014年

2 李夏君;杨淼;刘冉;尹立红;;淮安食管癌与癌旁组织中差异表达LncRNA的初步分析[A];中国毒理学会第六届全国毒理学大会论文摘要[C];2013年

3 沈远;付汉江;朱娟娟;刘珊珊;钟一然;郑晓飞;;应用多重PCR方法进行lncRNA功能研究[A];中国生物化学与分子生物学会第十一次会员代表大会暨2014年全国学术会议论文集——专题报告五[C];2014年

相关博士学位论文 前3条

1 王富博;尿液lncRNA在前列腺癌早期诊断以及新型miRNA在进展机制中的研究[D];第二军医大学;2016年

2 陈升东;精神分裂症外周血单核细胞中差异性表达lncRNA的临床和实验研究[D];第二军医大学;2016年

3 刘佳;静态牵张力作用下健康和牙周病微环境来源PDLSCs生物学功能及LncRNA表达谱的研究[D];第四军医大学;2016年

相关硕士学位论文 前10条

1 周明;人胰腺癌吉西他滨耐药细胞株中lncRNA表达谱初步筛选及研究[D];苏州大学;2015年

2 郝超;LncRNA在前列腺增生性炎性萎缩向前列腺癌恶性转化过程中差异表达的初步研究[D];南昌大学医学院;2015年

3 金雯;基于结构的植物lncRNA相互作用研究[D];吉林大学;2016年

4 孙影;基于miRNA的lncRNA和mRNA的调控网络[D];吉林大学;2016年

5 李凌雪;lncRNA参与幽门螺杆菌感染相关胃癌的功能及机制研究[D];北京协和医学院;2016年

6 刘红梅;GMA诱导的16HBE恶性转化细胞相关差异LncRNA筛选及其研究[D];中国疾病预防控制中心;2016年

7 赵宁;产肠毒素性大肠杆菌感染小鼠致腹泻LncRNA表达谱的构建[D];宁夏大学;2016年

8 陈牡丹;LncRNA在肝硬化进程中的差异表达[D];福建医科大学;2016年

9 廖武;长链非编码RNA(lncRNA)在稳定性心绞痛中的表达谱分析[D];新疆医科大学;2017年

10 段伟丽;膀胱癌血清lncRNA诊断模型的建立及其对膀胱癌复发监测的临床意义[D];山东大学;2017年



本文编号:1851985

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/zlx/1851985.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1c611***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com