当前位置:主页 > 医学论文 > 肿瘤论文 >

基于粒子群优化的选择性自助集成算法用于肺癌血清的~1H NMR代谢组学数据分析研究

发布时间:2018-09-14 07:38
【摘要】:作为研究全部生物分子的组学之一,代谢组学以大量的包括部分氨基酸、脂质、有机酸等小分子在内的代谢产物为研究对象,对它们的变化和代谢途径进行整体分析,从而产生富含变量的数据。如何从多维、复杂的数据中挖掘出潜在的关键性信息是完成代谢组学数据分析任务的重心。选择性集成学习算法是一类从众多集成学习器中选择出部分学习器参与集成从而获得更好的泛化性能和更高的预测效率的方法,是集成学习算法性能提高的新思路,近些年已日益引起研究者们的关注。在本论文中,鉴于代谢组学数据固有的特性、自助集成学习算法(Bagging)的优缺点以及粒子群优化算法(particle swarm optimization, PSO)强大的优化性能,我们引入粒子群优化算法来提高Bagging算法的性能,发展了一种选择性Bagging算法,并将其用于提高基础学习算法分类树(classification tree, CT)和偏最小二乘-判别分析(partial least squares-discriminant analysis, PLS-DA)的稳定性和泛化能力,由此提出了两种新颖的代谢组学数据解析方法,分别开展了以下两方面研究:(1)从集成学习算法的泛化误差/偏差分解理论基础出发,在保证子模型准确度的同时,增大子模型间的差异性,能进一步提高集成算法的性能。因此,在本章中,提出基于粒子群优化的选择性自助集成(Bagging)算法,并将其用于提高不稳定的模式识别技术—CT的稳定性和泛化性能,发展了一种新型的代谢组学数据解析技术,即,PSOBAGCT。该算法首先采用Bagging思路产生一系列差异性较大的CT模型(即自助集成分类树算法,BAGCT),然后,同时考虑集成模型的误差和子模型之间的差异性来设计PSO算法中的目标函数,利用PSO来选择部分准确度高且差异性大的子模型用于最终的模型集成,最后采用相对多数投票法产生最终学习器集成的输出结果。在本章中,PSOBAGCT算法被用于基于1H NMR收集到的三组血清样本的代谢组学数据分析中,这三组样本分别是健康的志愿者、新诊断的肺癌患者和治疗后又复发的肺癌患者,并将BAGCT和CT也用于该数据解析中,以验证新算法的性能。结果表明:Bagging算法能够显著改善单一识别模型分类树的识别性能和稳定性,而且选择性自助集成分类树算法(PSOBAGCT)通过引入PSO算法使得其泛化能力明显优于自助集成分类树算法(BAGCT)。此外,通过该算法还获得了可以区分肺癌患者与健康者的显著性代谢物,如脂质、乳酸、糖蛋白、丙氨酸、苏氨酸、肌醇、3-羟基丁酸盐、二甲胺、谷氨酰胺、脯氨酸和三甲胺。(2)考虑到PLS-DA模式识别技术在代谢组学数据解析中的优缺点,在本章中,我们以此为基本学习器,将第二章发展的选择性自助集成(PSOBAG)算法用于提升PLS-DA算法的识别性能,形成了另一种新型的代谢组学数据解析方法,即:基于粒子群优化的选择性自助集成偏最小二乘-判别分析(PSOBAGPLS-DA)。该方法通过引入PSO算法,实现了对所有由Bagging算法训练产生的PLS-DA模型(即自助集成偏最小二乘-判别分析,BAGPLS-DA)的优选。同样地,PSOBAGPLS-DA联合BAGPLS-DA、PLS-DA被用于基于1H NMR的肺癌血清代谢组学数据分析中。研究表明:采用Bagging算法训练产生一系列PLS-DA模型形成的方法(BAGPLS-DA)能显著提高PLS-DA的识别性能,此外,通过引入PSO发展的选择性Bagging算法,能进一步提高建模算法的性能;同时,PSOBAGPLS-DA法还能识别一些具有显著性差异的关键性肺癌血清代谢标志物,其中有脂质、乳酸、糖蛋白、丙氨酸、苏氨酸、肌醇、谷氨酰胺、脯氨酸、三甲胺和胆碱。
[Abstract]:Metabonomics, as one of the studies of all biological molecules, takes a large number of metabolites, including some small molecules such as amino acids, lipids, organic acids, and so on, as the research object, and makes a comprehensive analysis of their changes and metabolic pathways to produce variable-rich data. Selective ensemble learning (SIL) algorithm is a new approach to improve the performance of ensemble learning algorithm, which is a method to select part of the learners from many ensemble learners to participate in the ensemble to obtain better generalization performance and higher prediction efficiency. In this paper, in view of the inherent characteristics of metabonomic data, the advantages and disadvantages of self-help ensemble learning algorithm (Bagging) and the powerful optimization performance of particle swarm optimization (PSO), we introduce particle swarm optimization (PSO) to improve the performance of Bagging algorithm and develop a selective Bagging algorithm. This algorithm is used to improve the stability and generalization ability of basic learning algorithm classification tree (CT) and partial least squares-discriminant analysis (PLS-DA). Therefore, two novel metabonomics data analysis methods are proposed, and the following two aspects are studied: (1) Based on the generalized error/deviation decomposition theory of ensemble learning algorithm, the performance of ensemble algorithm can be further improved by increasing the difference between sub-models while ensuring the accuracy of sub-models. Therefore, in this chapter, a selective self-help ensemble (Bagging) algorithm based on particle swarm optimization is proposed and applied to improve the instability. A new metabonomic data analysis technique, PSOBAGCT, is developed based on the stability and generalization performance of CT. This algorithm firstly uses Bagging method to generate a series of CT models with large difference (i.e. BAGCT) and then considers the error of integration model and the difference between sub-models. In this chapter, PSOBAGCT algorithm is used to metabolize three sets of serum samples collected by 1H NMR. In the analysis of histological data, these three groups of samples were healthy volunteers, newly diagnosed lung cancer patients and lung cancer patients who recurred after treatment. BAGCT and CT were also used to analyze the data to verify the performance of the new algorithm. The selective self-help ensemble classification tree algorithm (PSOBAGCT) is superior to the self-help ensemble classification tree algorithm (BAGCT) in generalization ability by introducing PSO algorithm. In addition, significant metabolites such as lipids, lactic acid, glycoprotein, alanine, threonine, inositol and 3-hydroxybutyrate can be distinguished between lung cancer patients and healthy subjects. Salts, dimethylamine, glutamine, proline and trimethylamine. (2) Considering the advantages and disadvantages of PLS-DA pattern recognition technology in metabonomic data analysis, we use this as a basic learning tool and use the selective self-help integration (PSOBAG) algorithm developed in Chapter 2 to improve the recognition performance of PLS-DA algorithm, forming a new type of PLS-DA algorithm. METABOLOGICAL DATA ANALYSIS METHOD, namely Selective Self-Integrated Partial Least Squares Discriminant Analysis (PSOBAGPLS-DA) based on Particle Swarm Optimization (PSOBAGPLS-DA), implements the optimization of all PLS-DA models (BAGPLS-DA) trained by Bagging algorithm by introducing PSO algorithm. LS-DA combined with BAGPLS-DA and PLS-DA were used in the analysis of lung cancer serum metabonomic data based on 1H NMR. The results showed that the method of generating a series of PLS-DA models (BAGPLS-DA) trained by Bagging algorithm could significantly improve the recognition performance of PLS-DA. In addition, the selective Bagging algorithm developed by PSO could further improve the modeling performance. At the same time, PSOBAGPLS-DA method can also identify some key lung cancer serum metabolic markers with significant differences, including lipid, lactic acid, glycoprotein, alanine, threonine, inositol, glutamine, proline, trimethylamine and choline.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:R734.2

【参考文献】

相关期刊论文 前5条

1 陈音;钟美佐;哈木拉提·吾甫尔;巴吐尔·买买提明;邓皖利;张洪亮;王锐;;乳腺癌患者血浆和尿液的~1H-NMR代谢组学[J];科技导报;2014年13期

2 马变颖;王梓;张彬;朱敏;刘静;;肺癌相关抑癌基因的研究进展[J];生命科学研究;2014年01期

3 张亚男;赵宗兴;张亚坤;韩磊;王珂;;血清肿瘤标记物在肺癌诊断中的意义[J];中国实验诊断学;2014年02期

4 牛艳洁;江银玲;许长江;王向迎;刘友如;赵珩;韩宝惠;姜丽岩;;代谢组学方法分析肺癌患者血清和尿液小分子代谢产物的初步研究[J];中国肺癌杂志;2012年04期

5 田雨波;李正强;朱人杰;;基于混沌PSO算法的选择性神经网络集成方法[J];计算机应用;2008年11期



本文编号:2242018

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/zlx/2242018.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户64b13***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com