基于多标签体检数据的疾病风险分析方法研究
发布时间:2019-03-15 21:01
【摘要】:健康体检是疾病预防很重要的环节。医生可以根据个人的健康体检结果及时分析潜在的病症,进而对其进行健康指导。针对健康体检结果的分析,传统的处理方式为有经验的医生针对身体各部分的体检结果给出整体的健康状况和疾病风险分析,随着数据的日益增多,以及医生经验的良莠不齐等现状,人工的分析方法在效率和精度方面无法满足日益增多的体检需求。随着数据挖掘技术的发展,人工智能、机器学习方法已被广泛用于医疗辅助诊断和疾病风险分析。数据预处理是机器学习的重要环节之一,在医疗体检数据中,体检结果往往存在个体性差异。体现在对于某一个特征,整个人群的特征数值分布的标准差相对较大,而且在均值以下的数量远超在均值以上的数量,表现为数据分布极为不平稳。然而,传统化的数据归一化方法并不能很好的规避这一问题。通过数学变换可以较好地解决这一问题并在一定程度上提高模型的收敛速度以及精度。本文主要工作包括:1、提出FN(Fusion normalization)方法来进行特征的平稳化处理,并将特征值归一化至(0,1)区间;2、针对多标签问题,本文分别建立以SVM、GBDT、LR为基础分类器的三种组合模型SVMs、GBDTs、LRs来处理医学多标签数据;3、针对医学体检指标值正常人群数量大于异常人群数量,而造成的数据不平衡问题,本文根据标签数据的比值采用对不同的标签设置不同的惩罚因子的方法来处理。本文的数据集包含性别、空腹血糖等62个特征,高血压、糖尿病、脂肪肝3个标签。数据集中数据类型有字符型和数值型。实验结果表明:FN(Fusion normalization)方法处理过后的体检数据相比于不做归一化的的数据,Max_min归一化以及标准归一化方法,在组合模型SVMs、GBDTs、LRs上的准确率均有不同程度的提高。
[Abstract]:Health check-up is a very important part of disease prevention. Doctors can analyze the underlying symptoms on the basis of individual health check-up results, and then provide health guidance to them. According to the analysis of the health examination results, the traditional treatment method is to give the whole health condition and disease risk analysis for the experienced doctors according to the physical examination results of each part of the body. With the increasing of the data, As well as the mixed experience of doctors and so on, the artificial analysis method can not meet the increasing demand for physical examination in terms of efficiency and accuracy. With the development of data mining technology, artificial intelligence and machine learning methods have been widely used in medical assistant diagnosis and disease risk analysis. Data preprocessing is one of the important links in machine learning. In medical physical examination data, there are often individual differences in the results of physical examination. For a certain feature, the standard deviation of the distribution of the characteristic values of the whole population is relatively large, and the number below the mean value is far higher than the number above the mean value, which shows that the distribution of the data is extremely uneven. However, the traditional method of data normalization is not a good way to avoid this problem. This problem can be solved by mathematical transformation and the convergence speed and precision of the model can be improved to a certain extent. The main work of this paper is as follows: (1) the FN (Fusion normalization) method is proposed to stabilize the features and normalize the eigenvalues to (0,1); 2. Aiming at the multi-label problem, this paper establishes three combination models based on SVM,GBDT,LR classifier, SVMs,GBDTs,LRs, to deal with medical multi-label data. 3. In view of the imbalance of data caused by the number of normal population is larger than that of abnormal population, according to the ratio of label data, the method of setting different punishment factors for different labels is adopted to deal with the problem. This data set contains 62 features such as gender, fasting blood glucose, hypertension, diabetes, and fatty liver. The data types in the dataset are character type and numeric type. The experimental results show that the accuracy of the: FN (Fusion normalization) method in combination model SVMs,GBDTs,LRs is improved to some extent compared with the non-normalized data, the Max_min normalization method and the standard normalization method.
【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R194.3;TP18
本文编号:2440983
[Abstract]:Health check-up is a very important part of disease prevention. Doctors can analyze the underlying symptoms on the basis of individual health check-up results, and then provide health guidance to them. According to the analysis of the health examination results, the traditional treatment method is to give the whole health condition and disease risk analysis for the experienced doctors according to the physical examination results of each part of the body. With the increasing of the data, As well as the mixed experience of doctors and so on, the artificial analysis method can not meet the increasing demand for physical examination in terms of efficiency and accuracy. With the development of data mining technology, artificial intelligence and machine learning methods have been widely used in medical assistant diagnosis and disease risk analysis. Data preprocessing is one of the important links in machine learning. In medical physical examination data, there are often individual differences in the results of physical examination. For a certain feature, the standard deviation of the distribution of the characteristic values of the whole population is relatively large, and the number below the mean value is far higher than the number above the mean value, which shows that the distribution of the data is extremely uneven. However, the traditional method of data normalization is not a good way to avoid this problem. This problem can be solved by mathematical transformation and the convergence speed and precision of the model can be improved to a certain extent. The main work of this paper is as follows: (1) the FN (Fusion normalization) method is proposed to stabilize the features and normalize the eigenvalues to (0,1); 2. Aiming at the multi-label problem, this paper establishes three combination models based on SVM,GBDT,LR classifier, SVMs,GBDTs,LRs, to deal with medical multi-label data. 3. In view of the imbalance of data caused by the number of normal population is larger than that of abnormal population, according to the ratio of label data, the method of setting different punishment factors for different labels is adopted to deal with the problem. This data set contains 62 features such as gender, fasting blood glucose, hypertension, diabetes, and fatty liver. The data types in the dataset are character type and numeric type. The experimental results show that the accuracy of the: FN (Fusion normalization) method in combination model SVMs,GBDTs,LRs is improved to some extent compared with the non-normalized data, the Max_min normalization method and the standard normalization method.
【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R194.3;TP18
【参考文献】
相关期刊论文 前10条
1 董健;邓国辉;李金武;;基于二维傅里叶变换实现图像变换的研究[J];福建电脑;2015年09期
2 东珍;;健康体检数据分析肥胖及相关疾病——以中央民族大学退休教工为例[J];中央民族大学学报(自然科学版);2015年01期
3 王霄;周李威;陈耿;朱玉全;;一种基于标签相关性的多标签分类算法[J];计算机应用研究;2014年09期
4 米国莲;王春艳;司润辉;陶丽;;健康体检人群体重指数与高血压和高血糖关系的调查分析[J];河北医药;2013年19期
5 李思男;李宁;李战怀;;多标签数据挖掘技术:研究综述[J];计算机科学;2013年04期
6 郑曦;时荣海;姚道阔;卓玛次仁;唐杰;贺燕;;拉萨1370名藏族群众高血压患病情况及影响因素的Logistic回归分析[J];公共卫生与预防医学;2013年01期
7 王燕华;;某高校教职员工健康体检数据分析[J];华南国防医学杂志;2012年06期
8 马正甲;;健康体检中脂肪肝检验结果与相关的影响因素研究[J];医学检验与临床;2012年06期
9 刘博;常玲;卢云涛;;高校教职工体检人群高血压危险因素的病例对照研究[J];中国全科医学;2012年26期
10 赵文华;宁光;;2010年中国慢性病监测项目的内容与方法[J];中华预防医学杂志;2012年05期
,本文编号:2440983
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2440983.html