基于机器学习的卵巢肿瘤预测与分析研究
本文选题:机器学习 + 数据挖掘 ; 参考:《吉林大学》2016年硕士论文
【摘要】:21世纪以来随着信息科技的飞速发展,计算机在社会发展中发挥着越来越重要的作用。随着医院信息化的发展(医院信息系统和电子病历的应用)、数据储存技术的发展,医院数据库积累了大规模的数据。然而,目前大多数医院对于数据的处理还仅仅停留在“增、删、改、查”的低端数据处理操作,缺乏数据集成和分析的技术,更加无法利用已经获取的数据进行辅助医学决策和自动获取知识。另一方面,面对大量的数据,传统的数据分析和处理方法已经无法获得数据之间的隐藏信息和内在关联,现在我们遇到的问题是,数据收集的手段得到飞速发展,数据存储的技术得到显著提高,但是如何利用这些来之不易的数据学以致用是我们现在主要面临的问题。本文在研究了数据挖掘的相关理论基础后,首先利用数据挖掘的相关理论基础对收集到的用于评价卵巢肿瘤的关键医学数据进行筛选和预处理。通过学习机器学习算法选取了机器学习中适合于医学数据挖掘的四种分类器:支持向量机,对于小的样本集、非线性样本集及需要进行高维降维的模式识别中有较好的效果,并且可以拓展到函数拟合等其他问题中。朴素贝叶斯分类器,朴素贝叶斯模型有坚实的数学基础,分类效果稳定,并且所需要的样本空间很小,对有缺陷的数据集不敏感,算法简单。最近邻分类器,此方法对于类域的交叉或重叠较多的待分样本集来说,分类效果较好。随机森林算法对于很多种资料,可以产生高准确度的分类器,适合处理大量的输入变量,并且学习过程快。并且本文针对所采集数据设计了一个人工神经网络算法,由于其具有自学习能力、高速寻找最优解能力和联想存储功能,在构建数据分类算法方面,效果显著。本文分别用这五种算法进行分类预测分析,通过统计学理论知识对实验结果进行检验,并且将实验结果与国内外研究结果的准确性进行分析比较。从机器学习的角度认识、理解实验结果,并且进行算法的整体性能评价,通过分析本文的实验结果,提取出有关于卵巢肿瘤临床医学数据的分类提取规则,实现针对卵巢癌早期预测的目的,以辅助临床诊断。做到早预测,早治疗,提高卵巢癌患者的生存率。
[Abstract]:With the rapid development of information technology in the 21st century, computer plays an increasingly important role in social development. With the development of hospital information (the application of hospital information system and electronic medical records, and the development of data storage technology), the hospital database has accumulated a large scale of data. However, at present, the data processing in most hospitals only stays at the low-end data processing operation of "increase, delete, change and check", and lacks the technology of data integration and analysis. It is even more difficult to use the acquired data to assist medical decision making and automatic acquisition of knowledge. On the other hand, in the face of a large amount of data, the traditional methods of data analysis and processing have been unable to obtain the hidden information and internal correlation between the data. The problem we now encounter is the rapid development of the means of data collection. Data storage technology has been greatly improved, but how to use these hard-won data for practical use is the main problem we now face. After studying the theoretical basis of data mining, the key medical data collected for the evaluation of ovarian tumors are screened and preprocessed by using the relevant theoretical basis of data mining. Through learning machine learning algorithm, four kinds of classifiers suitable for medical data mining in machine learning are selected: support vector machine (SVM), which has a good effect on small sample set, nonlinear sample set and pattern recognition requiring high dimension reduction. And it can be extended to other problems such as function fitting. Naive Bayesian classifier, naive Bayesian model has a solid mathematical foundation, the classification effect is stable, and the required sample space is very small, is not sensitive to the defective data sets, and the algorithm is simple. The nearest neighbor classifier has a good classification effect for the sample set with more crossover or overlap. For many kinds of data, the stochastic forest algorithm can produce high accuracy classifier, which is suitable for dealing with a large number of input variables, and the learning process is fast. In this paper, an artificial neural network algorithm is designed for the collected data. Because of its self-learning ability, high-speed ability to find the best solution and associative storage, it has a remarkable effect in constructing data classification algorithm. In this paper, the five algorithms are used for classification and prediction analysis, and the experimental results are tested by statistical theory knowledge, and the accuracy of the experimental results is analyzed and compared with the domestic and foreign research results. From the point of view of machine learning, the experimental results are understood, and the whole performance of the algorithm is evaluated. By analyzing the experimental results in this paper, the rules of classification and extraction of clinical medical data about ovarian tumors are extracted. To achieve early prediction of ovarian cancer to assist clinical diagnosis. To achieve early prediction, early treatment, improve the survival rate of patients with ovarian cancer.
【学位授予单位】:吉林大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP181;TP311.13
【相似文献】
相关期刊论文 前2条
1 卢朝晖;王敏;王宁;胡琬;;食物与乳腺癌、卵巢癌风险关系的流行病学文献统计分析[J];中华医学图书情报杂志;2010年12期
2 ;[J];;年期
相关会议论文 前10条
1 李利;王超英;林忠乙;;老年卵巢肿瘤82例分析[A];中国抗癌协会妇科肿瘤专业委员会第六次全国学术会议论文汇编[C];2001年
2 杨幼易;;老年妇女卵巢肿瘤手术治疗140例临床分析[A];中国抗癌协会妇科肿瘤专业委员会第六次全国学术会议论文汇编[C];2001年
3 杨帆;杨太珠;罗红;朱琦;郭文琪;田雨;陈娇;;生育前期女性卵巢肿瘤39例超声诊断[A];2005年全国医学影像技术学术会议西部论坛论文汇编[C];2005年
4 张海;李光展;吴瑛;卢俊;王慧芳;邓伟莲;;经阴道彩色多普勒血流图检测卵巢肿瘤血管的临床价值[A];中华医学会第六次全国超声医学学术年会论文汇编[C];2001年
5 洪树勋;许红;曹良杰;;801例卵巢肿瘤临床分析[A];纪念卓越的人民医学家林巧稚大夫诞辰100周年——全国妇产科高级学术论坛论文集[C];2001年
6 梁元姣;叶小勤;;老年妇女双侧卵巢巨大肿瘤1例报告[A];中国抗癌协会妇科肿瘤专业委员会第六次全国学术会议论文汇编[C];2001年
7 刘力;李冰琳;张启培;;836例卵巢肿瘤临床病理分析[A];第八次全国妇产科学学术会议论文汇编[C];2004年
8 陈晓玲;纪莉;吴晓燕;鱼红菊;王琳;;彩色多普勒超声在卵巢肿瘤诊断中的应用[A];第一届全国妇产科超声学术会议论文汇编[C];2006年
9 许幼峰;郭e,
本文编号:1799146
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1799146.html