当前位置:主页 > 医学论文 > 肿瘤论文 >

基于流行病学、临床症状、肿瘤标志和影像学特征的肺癌诊断模型的建立

发布时间:2018-08-02 19:05
【摘要】:肺癌是一种涉及了基因和表观遗传改变的复杂疾病,是世界范围内癌症死亡的主要原因。近年来,中国地区的肺癌发病率较前明显增加,成为一个重大的公共卫生挑战。尽管随着医疗水平的提高和临床药物应用的发展,肺癌的治疗效果得到有效的改善,但是晚期肺癌患者的生存率和总体预后仍然在相对较低的水平。因此,提高早期诊断效率是改善肺癌患者预后的关键。流行病学表明肺癌和吸烟有强大的统计学关联。据估计,全世界有12.5亿烟民,每年超过一百万人死于烟草导致的肺癌。吸烟与肺癌的发生发展密切相关,85%的肺癌与吸烟有关,并且2年生存率10%。美国肺癌筛查中心推荐对吸烟史30年包,且禁烟15年的患者进行肺癌筛查。肿瘤标志是反映存在于肿瘤中的生物化学物质。它们不存在于正常成人组织中,或只存在于胚胎组织。它们在肿瘤组织中的含量显著超过正常组织。它们量或质的改变也许预示了肿瘤的本质,以便了解肿瘤组织、细胞分化和细胞功能,并帮助进行肿瘤的诊断、分类、判断预后和治疗。然而,单一肿瘤标志检测的敏感性和特异性通常低于由不同特性、敏感性和互补性的多个肿瘤标志组成的肿瘤标志群。因此,目前多采用多种肿瘤标志联合诊断来提高早期肺癌的检测。影像学是临床肺癌诊断的一种重要的方法。但是低灵敏度的X线片也是肺癌诊断延误的主要原因。近来,在美国肺癌筛查试验研究表明,在肺癌高危人群中进行低剂量CT扫描相比X线降低了20%肺癌死亡率。因此,这项检查被美国预防服务中心、美国癌症协会和其他咨询机构推荐。现在,低剂量CT被用于高度怀疑肺癌患者的检测,而且拥有高度敏感性来帮助发现和确定早期肺癌。然而,CT对肺癌诊断的特异度太差。肿瘤标志比动态CT成像表现出较低的敏感度,而特异性高于CT扫描。因此,CT成像扫描结合肿瘤标志能够有助于区分肺癌与良性肺疾病。数据挖掘技术作为建模工具已经证明了其从多个来源吸收信息并精确分析及建立复杂模型的能力。现在,许多研究将肿瘤特征同数据挖掘技术结合来诊断肿瘤。尽管肺癌诊断有很多因素并且它们之间有复杂的关系,数据挖掘技术能够学习不能通过数学方法描述的模糊评价,并且能够解决一些复杂的、不确定和非线性问题,特别是当面对大样品、多媒体、多变量时,数据挖掘技术在解决非线性和未知数据分布问题上显示了更优秀的能力。目的:本研究在课题组前期成果的基础上,将血清肿瘤标志与流行病、临床症状、影像学特征联合,利用数据挖掘技术来建立肺癌-肺良性疾病辅助诊断模型,以期进一步提高肺癌诊断的准确率,并为肺癌诊断提供参考和辅助方法,改善肺癌患者的生存率和预后。方法:1.收集2014年10月至2016年3月郑州大学第一附属医院呼吸内科423例住院患者的血清学标本并测定血清肿瘤标志水平。从住院医师或主治医师修改并完成的住院病历中提取住院患者的流行病学和临床信息。包括性别、年龄、吸烟史、饮酒史、家族史(肿瘤方面);是否咳嗽、咳痰、痰中带血、乏力、发热出汗、声音嘶哑。2.使用Fisher判别分析和Logistic回归分析方法对血清肿瘤标志、流行病学和临床症状指标进行筛选优化。3.将筛选优化后指标通过数据挖掘技术(ANN、SVM、决策树C5.0)和Fisher判别分析建立肺癌诊断模型。4.同时收集423例患者的CT影像学资料,并根据病例的纳入和排除标准选取其中214例患者的CT影像作为研究对象。5.请3位高年资呼吸科主治医师分别对214例患者的CT影像进行判断,提取19项特征并评分。各个影像学特征的最后评分取3位医师的平均分。6.将提取的19项影像学指标通过Fisher判别分析和Logistic逐步回归分析的方法来筛选优化,并通过数据挖掘技术(ANN、SVM、决策树C5.0)和Fisher判别分析建立肺癌诊断模型。7.采用Fisher判别分析和Logistic逐步回归分析的方法对血清肿瘤标志、流行病及临床症状、CT影像学特征一系列指标进行筛选优化并通过数据挖掘技术(ANN、SVM、决策树C5.0)和Fisher判别分析建立肺癌诊断模型。结果:1.肿瘤标志联合流行病学及临床指标建立的各模型对预测集预测结果的灵敏度、特异度、准确度、阳性预测值和阴性预测值和AUC要明显高于单独肿瘤标志建立的各模型。2.肿瘤标志、流行病学、临床症状指标建立的模型中,ANN模型的的灵敏度、特异度、准确度、阳性预测值和阴性预测值和AUC均高于其他3种模型,ROC曲线下面积对比差异有统计学意义(P0.05)。3.肿瘤标志和流行病学及临床症状联合后各组指标建立的ANN模型之间AUC差异无统计学意义,但10种肿瘤标志和流行病学及临床症状全部Logistic逐步回归分析优化后的13项指标,即年龄、性别、吸烟史、咳痰、痰中带血、发热出汗和DNMT3B、DNMT1、HDAC1、胃泌素、NSE、CEA和钙离子,建立的ANN模型训练集准确度为100%,预测集准确度为94.33%,特异度95.5%,阳性预测值93.8%,均高于其他模型。4.Logistic逐步回归分析筛选出的空洞征、棘突征和气管狭窄3个变量建立的SVM模型对预测集预测结果的灵敏度为92.3%、特异度81.8%、准确度86.9%、阳性预测值90.6%、阴性预测值91.8%、AUC 0.857。5.将血清肿瘤标志、流行病学、临床症状、影像学联合,利用Logistic逐步回归分析筛选出16项指标建立的SVM模型对肺癌预测结果的特异度、准确度、阳性预测值、AUC分别为95.5%、97.2%、95.4%、0.969,灵敏度和阴性预测值为99.0%和95.4%。6.血清肿瘤标志、流行病学、临床症状、影像学指标联合建立的SVM和决策树C5.0模型对肺癌诊断效能优于单独影像学建立的SVM模型和决策树C5.0模型,AUC相比差异有统计学意义(P0.05)。结论:1.用Fisher判别分析和Logistic逐步回归分析分别对流行病学、临床症状和血清肿瘤标志指标进行筛选优化,优化后的指标联合建立肺癌ANN诊断模型,其灵敏度、特异度、准确度、阳性预测值、阴性预测值和AUC明显高于单纯血清肿瘤标志联合建立的数据挖掘模型,能够更好的对肺癌进行临床辅助诊断。2.Logistic回归分析筛选出的空洞征、棘突征和气管狭窄3个变量建立的SVM肺癌诊断模型可作为肺癌临床影像学诊断的一种方法。3.血清肿瘤标志、流行病学、临床症状、影像学指标联合建立的SVM模型和决策树C5.0模型对肺癌诊断效能优于单独影像学建立的SVM模型和决策树C5.0模型,可作为肺癌临床辅助诊断的一种优选方法。
[Abstract]:Lung cancer is a complex disease involving gene and epigenetic changes. It is the main cause of cancer death worldwide. In recent years, the incidence of lung cancer in China has increased significantly and has become a major public health challenge. Although with the improvement of medical level and the development of clinical drug application, the treatment effect of lung cancer Effective improvement is achieved, but the survival and overall prognosis of patients with advanced lung cancer are still at a relatively low level. Therefore, improving early diagnostic efficiency is the key to improving the prognosis of lung cancer patients. Epidemiology shows a strong statistical link between lung cancer and smoking. It is estimated that there are 1 billion 250 million smokers in the world and more than one million people die each year. Smoking is closely related to lung cancer. Smoking is closely related to the development of lung cancer. 85% of lung cancer is associated with smoking, and the 2 year survival rate of 10%. American Lung Screening Center recommends lung cancer screening for patients with a history of smoking for 30 years and 15 years of smoking. The tumor markers reflect the biological chemicals existing in the tumor. They do not exist in normal adults. In human tissues, or only in embryonic tissues. They are significantly higher in tumor tissues than in normal tissues. Their quantity or quality changes may predict the nature of the tumor to understand tumor tissue, cell differentiation and cell function, and help to diagnose, classify, judge prognosis and treat the tumor. However, single tumor markers are detected. The sensitivity and specificity are usually lower than the tumor markers consisting of multiple tumor markers with different characteristics, sensitivity and complementarity. Therefore, multiple tumor markers are used together to improve the detection of early lung cancer. Imaging is an important method for the diagnosis of lung cancer. However, the low sensitivity X-ray film is also a lung cancer. The main reason for the delay in diagnosis. Recently, the American lung cancer screening test showed that low dose CT scan in high risk people for lung cancer decreased the mortality of lung cancer by 20% compared with X ray. Therefore, this examination was recommended by the American preventive service center, the American Cancer Association and other advisory bodies. Now, low dose CT is used to highly suspect lung cancer. Patients were tested and highly sensitive to help identify and determine early lung cancer. However, the specificity of CT for lung cancer diagnosis was too poor. The tumor markers showed a lower sensitivity than the dynamic CT imaging, and the specificity was higher than the CT scan. Therefore, the CT imaging scan combined with the tumor markers could help to distinguish between lung cancer and benign lung disease. Mining technology as a modeling tool has proven its ability to absorb information from multiple sources and to accurately analyze and build complex models. Now, many studies combine tumor features with data mining techniques to diagnose tumors. Although there are many factors in the diagnosis of lung cancer and there are complex relationships among them, data mining techniques can be learned. The fuzzy evaluation can not be described by mathematical methods, and can solve some complicated, uncertain and nonlinear problems, especially when facing large samples, multimedia and multivariable, data mining technology shows better ability to solve the problem of nonlinear and unknown data distribution. At the same time, we combine the serum tumor markers with the epidemic, clinical symptoms and imaging features, and use data mining to establish the auxiliary diagnosis model of lung cancer and lung disease, in order to further improve the accuracy of lung cancer diagnosis, and provide reference and auxiliary methods for lung cancer diagnosis, and improve the survival rate and prognosis of lung cancer patients. Method: 1. collection of 20 A serological specimen of 423 hospitalized patients in the Department of respiratory medicine, the First Affiliated Hospital of Zhengzhou University, from October to March 2016, 14 years, and the level of serum tumor markers were measured. The epidemiological and clinical information of hospitalized patients was extracted from the hospitalized physician or the physician who was modified and completed. Tumor): whether coughing, phlegm, sputum, blood, fatigue, fever and sweating, hoarseness.2. using Fisher discriminant analysis and Logistic regression analysis methods for screening and optimizing serum tumor markers, epidemiological and clinical symptoms,.3. will be screened through data mining technology (ANN, SVM, decision tree C5.0) and Fisher discrimination The lung cancer diagnosis model.4. was established to collect the CT imaging data of 423 patients. According to the inclusion and exclusion criteria of the cases, the CT images of 214 patients were selected as the research object. 3 senior senior Department of respiration doctors were asked to judge 214 patients' CT images and extract 19 features and score. The final score of the feature is taken by the average score of 3 physicians to select the 19 image indexes extracted by Fisher discriminant analysis and Logistic stepwise regression analysis, and establish the lung cancer diagnosis model.7. using Fisher discriminant analysis and Logistic step by step through the data mining technology (ANN, SVM, decision tree C5.0) and Fisher discriminant analysis to establish the lung cancer diagnosis model.7. Regression analysis was used to select and optimize a series of indicators for serum tumor markers, epidemics and clinical symptoms, CT imaging features and to establish a diagnostic model for lung cancer by data mining (ANN, SVM, decision tree C5.0) and Fisher discriminant analysis. Results 1. the model of the combined epidemiology and clinical indicators of the tumor markers combined with the prediction set. The sensitivity, specificity, accuracy, positive predictive value, negative predictive value and AUC of the predicted results were significantly higher than that of the model.2. tumor markers, epidemiology, and clinical symptoms established by individual tumor markers, and the sensitivity, specificity, accuracy, positive predictive value, negative predictive value and AUC of the ANN model were higher than that of the model. There were 3 models, the difference in area contrast under ROC curves was statistically significant (P0.05), and there was no statistical difference between the ANN models established by the combination of.3. tumor markers and epidemiological and clinical symptoms, but the 10 tumor markers and the 13 index of all Logistic stepwise regression analysis of all the tumor markers and epidemiological and clinical symptoms, that is, year Age, sex, smoking history, expectoration, phlegm, blood, fever and sweating and DNMT3B, DNMT1, HDAC1, gastrin, NSE, CEA and calcium ions, the accuracy of the established ANN model training set was 100%, the accuracy of the prediction set was 94.33%, the specificity was 95.5%, and the positive predictive value was 93.8%, which were higher than those of the stepwise regression analysis of his model.4.Logistic. The sensitivity of the SVM model established by 3 variables of tracheal stenosis was 92.3%, the specificity was 81.8%, the accuracy was 86.9%, the positive predictive value was 90.6%, the negative predictive value was 91.8%. The serum tumor markers, epidemiology, clinical symptoms, and imaging were combined with the AUC 0.857.5., and the SVM of 16 indexes was selected by Logistic stepwise regression analysis. The specificity, accuracy, and positive predictive values of the predicted results of lung cancer were 95.5%, 97.2%, 95.4%, 0.969, respectively, and the sensitivity and negative predictive values were 99% and 95.4%.6. serum tumor markers, the epidemiological, clinical symptoms, and imaging indexes combined with SVM and the decision tree C5.0 model were better than the single imaging diagnosis for lung cancer. The SVM model and the decision tree C5.0 model were statistically significant (P0.05) compared with AUC. Conclusion: 1. the Fisher discriminant analysis and Logistic stepwise regression analysis were used to select and optimize the epidemiology, clinical symptoms and serum tumor markers respectively, and the optimized indexes were combined to establish the lung cancer ANN diagnostic model. The sensitivity, specificity and accuracy of the model were combined. The positive predictive value, negative predictive value and AUC are significantly higher than the data mining model combined with simple serum tumor markers. It can be better for the clinical diagnosis of lung cancer by clinical auxiliary diagnosis of.2.Logistic regression analysis of the cavity sign, spinous process and tracheal stenosis, 3 variables of SVM lung cancer diagnosis can be used as clinical imaging diagnosis of lung cancer A method of.3. serum tumor markers, epidemiology, clinical symptoms, the combined SVM model and the decision tree C5.0 model for the diagnosis of lung cancer is better than the SVM model and the decision tree C5.0 model, which can be used as a preferred method for the clinical diagnosis of lung cancer.
【学位授予单位】:郑州大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:R734.2

【参考文献】

相关期刊论文 前1条

1 赵爽;李镭;邱志新;程越;景玉婷;周永召;李为民;;四川地区2008年-2013年3,663例肺癌临床病理特征及流行趋势分析[J];中国肺癌杂志;2016年02期



本文编号:2160431

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/zlx/2160431.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户eb336***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com