基于有监督流形降维的自动化医学诊断

发布时间：2018-11-19 09:01

【摘要】：随着计算机技术的发展,人类社会已经进入到信息时代。在医学诊断领域,不可避免地会遇到大量的高维数据。传统的医学诊断技术主要受人为主观因素的影响,诊断的准确率较低,诊断的时间花费较大。研究表明,自动化医学诊断技术的诊断准确率较高,能够减少误诊率。当前,自动化医学诊断技术还没有被广泛应用,传统的专家系统依赖于数据库进行医学诊断,能够被医学工作者理解；但是专家系统所涉及的数据库中收集的数据较杂,冗余度较高,医学诊断准确率较低。支持向量机分类方法能够将收集到的医学信息分类,一定程度上缓解了传统专家系统数据库的局限性,提高了诊断的准确率,然而支持向量机分类方法存在黑盒效应——即无法解释推理过程和得出结论的“黑箱”特征,人们无法直观地看到处理的过程,可理解性不强。机器学习中的流形降维算法能够将高维数据降维投影到低维的可视空间,中间过程的可视化易于医学工作者的理解和分析,对医学诊断具有指导意义。不少降维算法被应用于自动化医学诊断领域,然而流形降维算法只能对医学信息降维而不能进行分类处理。本文提出先降维后分类的思想来处理高维的医学数据。显示的低维映射加上线性的分类决策面构建有利于提高可理解性。降维流形算法对大量的医学数据进行了预处理,降低了数据的冗余度并且提高了计算分析的精度。本文针对这一研究课题,对流形降维、分类技术进行了深入研究。本文的研究工作和主要研究成果包括：1.这篇文章提出了一种基于等度规映射的流形降维分类算法(简称SIMBA算法),SIMBA算法在ISOMAP算法的基础上融入监督信息,对高维医学数据进行了特征提取,采用决策树算法对降维后的结果分类,并且实现了测试数据扩展。中间过程的可视化增强了可理解性,更易于医学工作者的理解。依据真实医学数据集的测试,改进后的SIMBA算法分类准确率更高。2.这篇文章提出了一种基于局部线性嵌入算法(简称LLE算法)的降维分类算法(简称DLLEA算法),DLLEA算法的思想为：在LLE算法的基础上融入监督信息并采用线性支持向量机算法对降维后的结果分类,并且实现了测试数据的扩展。依据真实医学数据集的测试,DLLEA算法的分类准确率更高。3.这篇文章提出一种基于局部样条嵌入算法(简称LSE算法)的监督降维分类算法(简称SLSE算法)。SLSE算法的基本思想是在局部样条算法的基础上融入监督信息对高维数据进行降维,并且采用KNN分类算法对新增加的无标签数据进行分类。SLSE算法结合了LSE算法与LDA算法,产生一个明确的线性映射关系,从而得到流形上的数据点在低维空间的投影。
[Abstract]:With the development of computer technology, human society has entered the information age. In the field of medical diagnosis, it is inevitable to encounter a large number of high dimensional data. The traditional medical diagnosis technology is mainly influenced by subjective factors, the accuracy of diagnosis is low, and the time of diagnosis is large. The research shows that the diagnostic accuracy of automatic medical diagnosis technology is high and the misdiagnosis rate can be reduced. At present, automatic medical diagnosis technology has not been widely used. Traditional expert system relies on database for medical diagnosis, which can be understood by medical workers. But the data collected in the database involved in the expert system are relatively miscellaneous, the redundancy is high, and the accuracy of medical diagnosis is low. The classification method of support vector machine can classify the collected medical information to some extent alleviate the limitation of traditional expert system database and improve the accuracy of diagnosis. However, there is a black box effect in the classification method of support vector machines, that is, it is impossible to explain the reasoning process and the "black box" characteristic of the conclusion, people can not directly see the process of processing, and the comprehensibility is not strong. Manifold dimensionality reduction algorithm in machine learning can project high-dimensional data into low-dimensional visual space. The visualization of intermediate process is easy for medical workers to understand and analyze, and has guiding significance for medical diagnosis. Many dimensionality reduction algorithms are applied in the field of automatic medical diagnosis, but manifold dimensionality reduction algorithms can only reduce the dimension of medical information and cannot be classified. In this paper, the idea of dimensionality reduction and classification is proposed to deal with high dimensional medical data. The low dimensional map and the linear classification decision surface construction are helpful to improve the comprehensibility. The dimensionality reduction manifold algorithm preprocesses a large number of medical data, reduces the redundancy of the data and improves the accuracy of calculation and analysis. In this paper, convection dimension reduction and classification techniques are studied. The research work and main results of this paper include: 1. In this paper, a dimensionally reduced manifold classification algorithm based on isometric mapping (SIMBA algorithm,), SIMBA algorithm) is proposed. Based on the ISOMAP algorithm, the supervised information is incorporated into the), SIMBA algorithm, and the feature extraction of the high-dimensional medical data is carried out. Decision tree algorithm is used to classify the dimensionality reduction results, and the test data is extended. Visualization of intermediate processes enhances comprehensibility and is easier for medical practitioners to understand. According to the test of real medical data set, the improved SIMBA algorithm has higher classification accuracy. 2. 2. In this paper, a dimensionality reduction algorithm (DLLEA algorithm) based on local linear embedding algorithm (LLE algorithm) is proposed. The idea of DLLEA algorithm is as follows: based on the LLE algorithm, the supervised information is incorporated and the reduced dimension results are classified by linear support vector machine (LSVM) algorithm, and the test data are extended. According to the test of real medical data set, the classification accuracy of DLLEA algorithm is higher than that of real medical data set. 3. 3. This paper presents a supervised dimensionality reduction classification algorithm based on local spline embedding algorithm (LSE algorithm for short). The basic idea of). SLSE algorithm of SLSE algorithm is to incorporate the supervised information pair into the local spline algorithm. Dimension data is reduced, KNN algorithm is used to classify the newly added untagged data. The SLSE algorithm combines the LSE algorithm and the LDA algorithm to produce a clear linear mapping relation, thus the projection of the data points on the manifold in the low dimensional space is obtained.
【学位授予单位】：扬州大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP391.7;R44

【参考文献】