当前位置:主页 > 医学论文 > 中医论文 >

基于中医临床数据的疾病分类关键方法研究

发布时间:2017-12-27 20:34

  本文关键词:基于中医临床数据的疾病分类关键方法研究 出处:《西南石油大学》2017年硕士论文 论文类型:学位论文


  更多相关文章: 中医临床数据 疾病分类 不均衡数据分类 多标记分类 特征选择


【摘要】:随着中医信息化发展,中医诊断的客观化研究日益受到国内外重视。如何充分利用宝贵的中医临床数据资源来为中医学诊疗提供科学决策支持,促进中医学进一步发展,已成为研究的重点。数据挖掘是解决这些问题的一个新方法,而分类作为数据挖掘的主要研究内容之一,在中医临床辅助诊断中日益受到重视。特征选择可以提高分类性能,同时也为寻找中医特征和疾病之间的关系提供一种新思路。本文结合已收集中医临床数据的实际情况,从不均衡数据分类、多标记分类、特征选择对分类的影响这三个关键方面,对临床数据进行疾病分类研究。期望通过提高分类性能,进而提高计算机辅助诊断能力。主要工作有:第一,不均衡数据疾病分类方面。从数据层面入手,结合中医临床数据的实际情况,在欠采样的基础上进行改进。结合改进的抽样方式、Asymmetric Bagging提出改进算法FPUSAB。实验结果表明,与Asymmetric Bagging相比,FPUSAB算法在AUC上平均提升了 10.5%,在Bacc上平均提升为8.4%。第二,多标记数据疾病分类方面。针对中医临床数据存在的类别不均衡以及ML-kNN在寻找近邻的缺点,在WML-kNN的基础上引入粒计算提出了改进算法WM4LG-GkNN。实验结果表明,与改进前的算法相比,WML-GkNN在Hammin Loss上平均提升11.2%,在Avg precision上平均提升5.3%,Coverage上平均提升2.1%,One-Error上平均提升5.1%Ranking loss上平均提升7.6%。第三,特征选择对分类的影响。中医临床数据特征较多,不利于计算机辅助诊断。针对不均衡数据疾病分类的特征选择,引入预测风险标准,基于FPUSAB算法提出了PRFS-FPUSAB算法,实验表明特征选择后AUC平均提升了 7.4%;对于多标记疾病分类,使用在冠心病具有很好选择性能的HOML算法对多标记数据进行特征选择,实验表明特征选择后分类指标Hamming Loss平均提升17.77%、Avg precision平均均提升6.28%、Coverage 平均提升 15.73%、One-Error 平均提升 10.21%、Ranking Loss、平均提升25.22%,并且选择出的特征符合中医学相关疾病理论。
[Abstract]:With the development of information technology of traditional Chinese medicine, the research on the objectification of TCM diagnosis has been paid more and more attention at home and abroad. How to make full use of valuable TCM clinical data resources to provide scientific decision support for TCM diagnosis and treatment and promote the further development of TCM has become the focus of research. Data mining is a new method to solve these problems. Classification as one of the main contents of data mining is attracting more and more attention in clinical assistant diagnosis of TCM. The feature selection can improve the classification performance, and also provide a new way of thinking for the relationship between the characteristics of traditional Chinese medicine and the disease. Based on the actual situation of clinical data collected from TCM, the three key aspects of unbalanced data classification, multi label classification and feature selection on classification are studied in this paper. It is expected to improve the ability of computer aided diagnosis by improving the classification performance. The main work is: first, disequilibrium data classification. From the data level, combined with the actual situation of clinical data of traditional Chinese medicine, it is improved on the basis of undersampling. Combined with the improved sampling method and Asymmetric Bagging, the improved algorithm FPUSAB is proposed. The experimental results show that, compared with Asymmetric Bagging, the FPUSAB algorithm increases by 10.5% on the average of AUC, and the average increase is 8.4% on Bacc. Second, multi label data classification. Aiming at the imbalance of TCM clinical data and the shortcoming of ML-kNN in finding neighbors, we propose an improved algorithm WM4LG-GkNN based on WML-kNN and introducing granular computing. The experimental results show that, compared with the improved algorithm, WML-GkNN increased by 11.2% on Hammin Loss, increased by 5.3% on Avg precision, increased by 2.1% on Coverage, and increased by 7.6% on average on 5.1%Ranking loss on One-Error. Third, the influence of feature selection on classification. The clinical data of traditional Chinese medicine are characterized by many characteristics, which are not conducive to computer aided diagnosis. According to the characteristics of imbalanced data classification of diseases, the prediction risk criterion, the proposed PRFS-FPUSAB algorithm based on FPUSAB algorithm, experiments show that the feature selection of AUC improved by 7.4% on average; for the classification of multi marker of disease, good use of HOML algorithm on the performance of multi label data for feature selection in coronary heart disease with experiment shows that after feature selection the classification index of Hamming Loss average increase 17.77%, average Avg precision 6.28%, Coverage average increase of 15.73% increase, the average increase of 10.21%, Ranking One-Error Loss, the average increased by 25.22%, and features selected in accordance with the theory of traditional Chinese medicine related diseases.
【学位授予单位】:西南石油大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R24;TP311.13

【参考文献】

相关期刊论文 前10条

1 潘主强;张林;颜仕星;李国正;张磊;;中医睡眠情绪类疾病不均衡数据的分类研究[J];济南大学学报(自然科学版);2017年01期

2 余鹰;;多标记学习研究综述[J];计算机工程与应用;2015年17期

3 赵海峰;余强;曹俞旦;;基于粒计算的多标签懒惰学习算法[J];计算机科学;2014年12期

4 何志芬;杨明;刘会东;;多标记分类和标记相关性的联合学习[J];软件学报;2014年09期

5 谢娜娜;房斌;吴磊;;不均衡数据集上文本分类方法研究[J];计算机工程与应用;2013年20期

6 李敏;卡米力·木依丁;;特征选择方法与算法的研究[J];计算机技术与发展;2013年12期

7 李国正;曾雪强;;中医临床数据分析挖掘的研究进展[J];国际生物医学工程杂志;2013年02期

8 陶新民;郝思媛;张冬雪;徐鹏;;不均衡数据分类算法的综述[J];重庆邮电大学学报(自然科学版);2013年01期

9 赵自翔;王广亮;李晓东;;基于支持向量机的不平衡数据分类的改进欠采样方法[J];中山大学学报(自然科学版);2012年06期

10 朱明;陶新民;;基于随机下采样和SMOTE的不均衡SVM分类算法[J];信息技术;2012年01期



本文编号:1343119

资料下载
论文发表

本文链接:https://www.wllwen.com/zhongyixuelunwen/1343119.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户818a9***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com