基于数据降维和回归分析的结直肠癌风险预测模型的研究
本文选题:结直肠癌 切入点:环境暴露 出处:《西南大学》2017年硕士论文 论文类型:学位论文
【摘要】:目前,结直肠癌(Colorectal Cancer,CRC)已经成为世界范围内最常见的恶性肿瘤之一。全世界每年约有120万新发病例,约60万人死于该疾病。在过去的几十年,我国结直肠癌发病率处于较低水平,但近年来,随着人民生活水平的提高、食物结构、生活方式的改变,人均期望寿命明显延长,我国结直肠癌的发病率率和死亡率呈逐年上升趋势。结直肠癌除了具有较高的全球发病率,亦具有较高的死亡率。事实上,早期的结直肠癌患者5年生存率可达90%以上,然而由于其起病较为隐匿,临床上超过60%的患者在就诊时已处于中晚期,诊断时已发生局部转移,结直肠癌的5年生存率降至68%,而发生远处转移的结直肠癌患者年生存率仅为11%。为了诊断和治疗结直肠癌,癌症科学家们进行了大量的研究,但是到目前为止,其病因及发病机制仍然尚未完全明了,虽然大量的流行病学研究表明结直肠癌的发生是一个复杂过程,在这个过程中,它不仅会受到环境因素、遗传因素等诸多因素的影响。然而,究竟是哪些环境因素、遗传因素对结直肠癌的发生具有一定的影响仍不明确。因此,探索结直肠癌的致癌因子以及预测结直肠癌的发病风险,对实现结直肠癌早期诊断、早期治疗具有重要意义。本文利用生物医学分类、数据降维和回归分析预测,本文构建了一个多层次结直肠癌预测模型。另外,当我们构建了多层次预测模型之后,我们创新性的提出了一种最大互相关熵的广义核递归最小二乘算法(Generalized Kernel Recursive Maximum Correntropy Algorithm,GKRMC)的非线性回归方法来提高我们对结直肠癌的预测精度和准确率。本文的具体工作包括:(1)提出了GKRMC预测回归方法。我们提出了一种非线性的回归方法GKRMC来对模型进行回归分析预测。首先介绍了相关熵的基本概念,从最大相关熵准则的理论和核递归最小二乘法(Kernel Recursive Lease Squares,KRLS)的基础上提出了GKRMC算法,使得预测模型能更好的训练含有噪声的样本,具有较强的抗噪声的能力。(2)构建了一种多层次结直肠癌预测模型。在模型构建过程中,主要通过三个模块介绍整个流程。分别是:1.生物医学分类;2.数据降维;3回归分析预测。首先在生物医学分类的分析中,深度结合了生物医学知识对实验数据进行科学的分类,把数据划分为遗传信息、人口学特征、生活方式和食物等四类,从而更接进真实的基因多态性与环境因素的区别。然后建立恰当的数据降维模型,进一步筛选出与结直肠癌有关系并著有显著性差异的特征。接着说明了回归分析预测的流程,将逻辑回归、支持向量机、KRLS和GKRMC的预测器准确率度量结果进行对比分析,来验证GKRMC算法与传统的经典回归分析预测算法的优点以及差异。(3)实验结果与分析。试验部分针对本文提出的算法进行了实现并且对重要的实验过程和数据进行了强调,同时进行了对比实验来体现GKRMC算法的优越性。首先是总结了生物医学分类结果,接着展示了数据降维后的结果,最后进行横向方法的比较,证明了GKRMC算法在预测结直肠癌准确率方面的优势。基于以上工作去探索环境因素、基因多态性与结直肠癌的风险关联。研究结果表明:(1)环境因素和遗传因素在结直肠癌的发病中都起着重要的影响;(2)利用我们筛选出来的这些生物标记作为回归模型的分类器可以准确和有效的识别每个个体的结直肠癌风险;(3)我们提出的GKRMC算法比传统的回归方法具有更好的预测能力。
[Abstract]:At present, colorectal cancer (Colorectal, Cancer, CRC) has become one of the most common malignant tumors in the world. There are about 120 new cases worldwide each year, about 600 thousand people died of the disease. In the past few decades, the incidence of colorectal cancer in China is at a low level, but in recent years, with people's life raise the level of food structure, lifestyle changes, the average life expectancy was significantly prolonged in China, colorectal cancer incidence rate and mortality rate increased year by year. In addition to colorectal cancer with high incidence worldwide, also has a high mortality rate. In fact, the early colorectal cancer patients 5 years survival rate is more than 90% however, due to the onset of illness is clinically more than 60% patients have been in advanced stage, diagnosis has metastasis, 5 year survival rate fell to 68% in colorectal cancer, and the occurrence of distant metastasis of colorectal Patients with colorectal cancer year survival rate is only 11%. for the diagnosis and treatment of colorectal cancer, scientists have done a lot of research, but so far, the etiology and pathogenesis is still not completely clear, although a large number of epidemiological studies showed that the occurrence of colorectal cancer is a complicated process, in this process, it will not only be environmental factors, genetic factors and other factors. However, what exactly is the environmental factors, genetic factors on the incidence of colorectal cancer has a certain impact is still not clear. Therefore, to explore the colorectal cancer induced by cancer factor and predict the risk of colorectal cancer development, to realize the early diagnosis of colorectal cancer, early treatment has important significance. In this paper, the use of biomedical classification, dimensionality reduction and regression analysis, this paper constructs a multi-level prediction model of colorectal cancer. In addition, when we build a multi After the hierarchical prediction model, we propose a generalized kernel recursive least squares algorithm for maximum cross-correlation entropy (Generalized Kernel Recursive Maximum Correntropy Algorithm, GKRMC) of colorectal cancer to improve our prediction accuracy and the accuracy of the nonlinear regression method. The contents of this paper include: (1) proposed a regression prediction method GKRMC. We propose a nonlinear GKRMC regression method for regression analysis to forecast model. First introduced the basic concepts of entropy, maximum entropy criterion from the related theory and kernel recursive least square method (Kernel Recursive Lease Squares, KRLS) based on GKRMC algorithm is proposed, which can better forecast model the training samples containing noise, has strong anti noise ability. (2) to construct a multi-level colorectal cancer prediction model. In the model building process Three, mainly through the introduction of the whole process. The module is respectively: 1. biomedical classification; 2. data reduction; 3 regression analysis. First in the analysis of biomedical classification, combined with the depth of the biomedical knowledge scientific classification of the experimental data, the data is divided into genetic information, demographic characteristics, lifestyle and food etc. four, and more close to real difference between gene polymorphisms and environmental factors. Then establish appropriate dimensionality reduction model, further screening characteristics have significant difference and the author and colorectal cancer. Then explains the regression analysis prediction process, the logistic regression, support vector machine, KRLS predictor and GKRMC the accuracy of measurement results were analyzed to verify the GKRMC algorithm and the classical regression algorithm and difference. (3) experimental results and analysis section for the test. The proposed algorithm is implemented and the experiment process and the important data are highlighted, the superiority of the experiment is carried out to reflect the GKRMC algorithm. The first is a summary of the biomedical classification results, then shows the data after dimensionality reduction results, comparing the horizontal method, it is proved that the GKRMC algorithm accuracy rate the advantage in prediction of colorectal cancer. To explore environmental factors based on the above work, the risk associated gene polymorphism and colorectal cancer. The results show that: (1) environmental factors and genetic factors plays an important role in the pathogenesis of colorectal cancer; (2) we use screened these biomarkers as regression model the classifier can accurately and effectively identify the individual risk of colorectal cancer; (3) prediction ability of our proposed GKRMC algorithm is better than the traditional regression method.
【学位授予单位】:西南大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R735.3;O212.1
【参考文献】
相关期刊论文 前6条
1 李道娟;李倩;贺宇彤;;结直肠癌流行病学趋势[J];肿瘤防治研究;2015年03期
2 Hans Raskov;Hans-Christian Pommergaard;Jakob Burcharth;Jacob Rosenberg;;Colorectal carcinogenesis-update and perspectives[J];World Journal of Gastroenterology;2014年48期
3 冯祥;陈良彬;;基于主成分分析和独立成分分析的调制分类算法[J];电讯技术;2013年07期
4 覃婷;闫丽娜;王彤;;基于肿瘤患者高维生物信息的生存预测[J];中国卫生统计;2011年01期
5 ;Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation for water quality assessment[J];Journal of Environmental Sciences;2006年05期
6 ;Changing patterns of colorectal cancer in China over a period of 20 years[J];World Journal of Gastroenterology;2005年30期
相关博士学位论文 前1条
1 周紫垣;环境—膳食因素和遗传易感性与结直肠癌发病的研究[D];第三军医大学;2005年
相关硕士学位论文 前5条
1 马琳娜;基于相关熵的平均近邻最大间隔特征提取[D];河北大学;2014年
2 刘洋;主动式脑电信号特征提取方法研究及应用[D];济南大学;2014年
3 朱明;基于信息熵的导航传感器故障诊断技术研究[D];哈尔滨工程大学;2010年
4 骆常好;惩罚logistic回归与多因子降维法交互作用分析及其应用[D];山西医科大学;2009年
5 林晓男;基于信息熵的粗糙集在结构损伤诊断中的应用[D];武汉理工大学;2007年
,本文编号:1569561
本文链接:https://www.wllwen.com/yixuelunwen/zlx/1569561.html