改进的关联规则算法在慢性病数据挖掘中的研究

发布时间:2018-05-25 21:59

  本文选题:慢性病 + 关联规则 ; 参考:《浙江理工大学》2017年硕士论文


【摘要】:关联规则挖掘作为数据挖掘技术重要的研究分支,其目的是从大量数据中发现数据项之间的相关关系。由于挖掘产生的规则形式简单、易于理解,关联规则技术的研究和应用得到了蓬勃发展。我国慢性病患者人数众多。为了有效利用慢性病患者的医疗数据,为预防和管控慢性病提供科学依据,本文选取了慢性病之一的高血压进行数据挖掘方面的研究。本文主要探究高血压患者体征与心血管风险水平之间的相关性,以及高血压与其他慢性病之间的关联性,重点完成了以下工作:(1)查阅国内外相关文献,分析了数据挖掘技术在慢性病等医疗领域的研究现状,总结了我国现阶段在医疗数据分析中存在的问题,确立了论文研究的主要内容及路线。(2)对数据挖掘技术及关联规则的相关理论进行阐述,重点研究了关联规则挖掘中的Apriori算法,分析了该算法在性能方面的瓶颈,并探讨了现有的优化方法,为算法的改进拓宽了思路。(3)针对Apriori算法运行效率上的缺陷,进行如下改进:采用聚簇矩阵压缩存储事务数据库,避免多次扫描原有的事务库;引入事先剪枝策略以产生较少候选项集,避免频繁项目集的大量连接的开销;添加慢性病类型这一约束条件,减少频繁项目集和无关规则的产生。最后通过Matlab仿真实验对比分析,证明了改进算法能够有效降低候选项目集的数量,并提高运行效率。(4)设计慢性病数据挖掘方案,将改进的Apriori算法应用到对高血压患者的体检数据处理中。对数据进行预处理,设置最小支持度和置信度阈值,给定约束和相关度等条件,进行关联规则挖掘。利用Logistic回归分析方法,探究慢性病之间的相关性,将分析结果与数据挖掘产生的规则对照,发现两种方法产生的结果相吻合,证实了实验的有效性。实验最终挖掘出符合医学规律的关联规则,通过它们可以准确判断高血压患者的心血管风险水平,预估所患慢性病的并发症,为医生的诊断提供了有价值的参考,为实现自动化判诊提供了理论研究基础。(5)开发了慢性病数据挖掘系统,并将改进的Apriori算法融入。系统能够探究慢性病医疗数据背后隐藏的知识,辅助医生决策,具有一定的实用价值。
[Abstract]:As an important research branch of data mining, association rule mining aims to discover the correlation between data items from a large amount of data. Because the rules produced by mining are simple and easy to understand, the research and application of association rules are booming. The number of chronic disease patients in China is numerous. In order to effectively utilize the medical data of chronic disease patients and provide scientific basis for the prevention and control of chronic diseases, this paper selects hypertension, one of chronic diseases, to carry out data mining research. This paper mainly explores the correlation between physical signs and cardiovascular risk levels in patients with hypertension, and the correlation between hypertension and other chronic diseases, with the emphasis on completing the following work: 1) consulting relevant literature at home and abroad. This paper analyzes the research status of data mining technology in the field of chronic diseases and other medical fields, and summarizes the problems existing in medical data analysis in China at the present stage. In this paper, the main content and route of this paper are established. The related theories of data mining and association rules are expounded. The Apriori algorithm in association rule mining is studied, and the bottleneck of the algorithm in performance is analyzed. This paper also discusses the existing optimization methods, broadens the train of thought for the improvement of the algorithm. (3) aiming at the shortcomings of the Apriori algorithm, the following improvements are made: the clustering matrix is used to compress the storage transaction database to avoid scanning the original transaction database several times; A pruning strategy is introduced to generate fewer candidate itemsets to avoid the overhead of large connections of frequent itemsets and to add the constraint of chronic disease types to reduce the generation of frequent itemsets and irrelevant rules. Finally, through the comparative analysis of Matlab simulation experiments, it is proved that the improved algorithm can effectively reduce the number of candidate itemsets and improve the running efficiency. The improved Apriori algorithm is applied to the physical examination data processing of patients with hypertension. The data is preprocessed, the minimum support and confidence threshold is set, and the constraint and correlation are given to mine the association rules. Logistic regression analysis was used to explore the correlation between chronic diseases. The results were compared with the rules produced by data mining, and the results of the two methods were found to be consistent, which proved the validity of the experiment. The experiment finally excavated association rules that accord with the medical rules, through which we can accurately judge the cardiovascular risk level of patients with hypertension, predict the complications of chronic diseases, and provide a valuable reference for the diagnosis of doctors. The data mining system of chronic diseases is developed, and the improved Apriori algorithm is integrated into it. The system can explore the hidden knowledge behind the medical data of chronic diseases and assist doctors to make decisions. It has certain practical value.
【学位授予单位】:浙江理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP311.13

【参考文献】

相关期刊论文 前10条

1 王素芹;田华;孙晓凤;王素珍;;二分类Logistic回归在冠心病危险因素研究中的应用[J];中国医院统计;2014年02期

2 王学松;郭强;;医疗数据分析及数据挖掘方法的应用[J];电子技术与软件工程;2014年02期

3 翟铁民;柴培培;魏强;郭锋;王从从;张毓辉;万泉;赵郁馨;;我国慢性非传染性疾病卫生费用与筹资分析[J];中国卫生经济;2014年02期

4 滕琪;樊小毛;何晨光;李烨;卢东昕;;医疗大数据特征挖掘及重大突发疾病早期预警[J];网络新媒体技术;2014年01期

5 王智钢;王池社;马青霞;;分布式并行关联规则挖掘算法研究[J];计算机应用与软件;2013年10期

6 刘晓蔚;;量化交易中无需最小支持度阈值的模糊关联规则挖掘[J];科学技术与工程;2013年26期

7 王临虹;;慢性病防控要高度重视导致慢性病的社会决定因素[J];中国健康教育;2013年05期

8 王爽;;慢性病管理与循证医学[J];中国实用内科杂志;2012年04期

9 陈薇薇;;Mirth平台和HL7标准下的PACS/RIS与HIS接口设计及其应用[J];医学信息学杂志;2012年01期

10 张琛;;BP神经网络模型优化研究[J];吉林省教育学院学报;2011年07期



本文编号:1934792

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1934792.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户8cfb3***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com