基于拉曼光谱的乳腺良恶性肿瘤识别模型研究
发布时间:2018-05-04 18:11
本文选题:乳腺癌 + 拉曼光谱 ; 参考:《东北师范大学》2017年硕士论文
【摘要】:乳腺癌是世界上最常见的女性癌症之一,其发病率逐年增加。拉曼光谱技术可以从分子水平的基础上对组织成分改变进行表征和解释,应用在疾病的诊断和活体组织的原位检测具有高灵敏度、无损的优点。但是拉曼光谱数据维度较大,测量过程中存在噪声,如果直接用来鉴别乳腺良恶性肿瘤有一定的难度。因此,针对这一问题,急需建立一个可以判别乳腺肿瘤良恶性模型,从而开展更有针对性的治疗。结合拉曼光谱数据,应用机器学习算法构建识别模型,这样乳腺肿瘤识别率提高,同时人工会诊的效率也得到了提高,达到更好的治疗效果。本文采集168例女性样本的拉曼光谱数据,检测样本由吉林大学第一医院乳腺外科提供。采集到的拉曼光谱数据比较复杂,存在数据维度大,数据样本量少的问题,直接用于构建分类模型,容易产生过拟合的问题,因此,根据研究者之前的工作,归纳出具有代表意义的乳腺组织良恶性的拉曼光谱数据特征峰,研究表明这些特征峰可以表征乳腺组织发生病变时组织成分的变化。经过这一步骤,数据维度降低,使用支持向量机(SVM)、极限学习机(ELM)和K近邻(KNN)方法建立分类模型。实验发现使用归纳出的峰值构建模型,得到的分类预测精度从51.67%到85.00%不等,并且模型有明显的倾向恶性组织类,分类目的不明确。为了解决上诉问题,采取特征选择和特征提取的方法找出最优的特征子集组合,以达到更高的分类准确率且更稳定的模型。分别使用序列前向选择(SFS)、Relief-F和联合稀疏判别分析(JSDA)对乳腺组织的特征峰进行分析,找到最优的特征子集组合。接着分别使用上面提到的分类方法构建模型。实验结果表明:使用特征选择和特征提取方法选取的特征子集组合构建的分类模型预测精度优于使用全部特征峰构建分类模型的预测精度。其中,基于KNN和JSDA构建的分类模型(KNN-JSDA)获得了最好的分类精度。KNN-JSDA模型对乳腺肿瘤良恶性的识别准确率为93.12%。总之,建立的KNN-JSDA模型的Kappa系数为0.84,说明分类效果具有参考价值。这些表明本文建立的KNN-JSDA模型具有良好的识别能力,能够识别乳腺肿瘤的良恶性。
[Abstract]:Breast cancer is one of the most common female cancers in the world, and its incidence is increasing year by year. Raman spectroscopy can be used to characterize and explain the change of tissue composition on the basis of molecular level. It has the advantages of high sensitivity and nondestructive in diagnosis of disease and in situ detection of living tissues. However, the dimension of Raman spectrum data is large, and there is noise in the measurement process, so it is difficult to distinguish breast benign and malignant tumors directly. Therefore, to solve this problem, it is urgent to establish a model to distinguish benign and malignant breast tumors, so as to carry out more targeted treatment. Combined with Raman spectrum data, machine learning algorithm is used to construct the recognition model, so that the recognition rate of breast tumor is improved, and the efficiency of artificial consultation is also improved to achieve a better therapeutic effect. The Raman spectrum data of 168 female samples were collected. The samples were provided by breast surgery department of the first Hospital of Jilin University. The Raman spectrum data collected are complex, have the problem of large data dimension and small sample size, which can be used directly to construct classification model, and it is easy to produce over-fitting problem. Therefore, according to the previous work of the researcher, The characteristic peaks of Raman spectrum data of breast tissues are summarized. The results show that these peaks can be used to characterize the changes of tissue composition in breast lesions. After this step, the data dimension is reduced. Support vector machine (SVM), extreme learning machine (ELM) and K-nearest neighbor (KNN) are used to establish classification model. The experimental results show that the prediction accuracy of the model is from 51.67% to 85.00%, and the model has an obvious tendency to malignant tissue, and the purpose of classification is not clear. In order to solve the problem of appeal, the methods of feature selection and feature extraction are used to find out the optimal combination of feature subsets to achieve a higher classification accuracy and more stable model. The feature peaks of mammary tissue were analyzed by SFSS-Relief-F and JSDAs, respectively, and the optimal combination of feature subsets was found. Then the model is constructed using the classification method mentioned above. The experimental results show that the prediction accuracy of the classification model constructed by the combination of feature subsets selected by feature selection and feature extraction methods is better than that of the classification model constructed by using all the feature peaks. Among them, the classification model based on KNN and JSDA (KNN-JSDAA) obtained the best classification accuracy. The accuracy of the KNN-JSDA model for the identification of benign and malignant breast tumors was 93.1212. In a word, the Kappa coefficient of the established KNN-JSDA model is 0.84, which shows that the classification effect has reference value. These results show that the proposed KNN-JSDA model has a good ability to identify benign and malignant breast tumors.
【学位授予单位】:东北师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R737.9;TP181
【参考文献】
相关期刊论文 前5条
1 苗红星;余建坤;;基于决策树的ID3算法和C4.5算法的比较[J];现代计算机(专业版);2014年15期
2 黄莉莉;汤进;孙登第;罗斌;;基于多标签ReliefF的特征选择算法[J];计算机应用;2012年10期
3 袁前飞;蔡从中;肖汉光;刘兴华;孔春阳;;基于支持向量机的乳腺癌预后状态预测和疗效评估[J];北京生物医学工程;2007年04期
4 兰天鸽;方勇华;;红外光谱信号预处理的新方法[J];红外与激光工程;2007年02期
5 张静,宋锐,郁文贤,夏胜平,胡卫东;基于混淆矩阵和Fisher准则构造层次化分类器[J];软件学报;2005年09期
相关硕士学位论文 前1条
1 桑应宾;基于K近邻的分类算法研究[D];重庆大学;2009年
,本文编号:1844051
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1844051.html