基于编辑距离的序列聚类算法及其在临床异常检测中的应用
本文选题:临床行为序列 + 序列聚类 ; 参考:《江苏大学》2017年硕士论文
【摘要】:目前,医疗保险的服务范围和面对的人群不断扩大,伴随而生的医疗欺诈及违规行为屡见不鲜。由于医疗领域高度的专业性以及医保交易的三方(医保机构、医疗单位、患者)存在着信息不对称,医疗异常行为对医保机构而言具有相当的隐蔽性。为此,研究临床异常行为的发现方法,对规范医疗秩序、预防医疗欺诈行为的发生具有重要的理论研究和实际应用价值。结合国内外的研究成果,本文分析了临床行为序列的时序性和自然凝聚性的特点,利用二分K均值算法对正常临床行为序列数据集进行聚类,将得到的簇作为正常轮廓,针对临床医疗事件异常行为的特征,研究了基于距离的异常检测技术,实现了临床异常检测原型系统,用以发现临床医疗数据中潜藏的异常行为。本文的主要工作如下:(1)提出基于整体相似度匹配的二分K均值序列聚类算法(PSClu)。PSClu对二分K均值算法的距离计算方式进行优化,采用编辑距离作为簇间序列相似性度量的函数,结合编辑距离的上下界、等长前缀子序列的编辑距离计算以及簇的质心近似求解方法,过滤部分编辑距离的计算,降低二分K均值算法的时间复杂度,最终快速聚类生成正常临床行为序列的簇。(2)提出待检序列到簇质心的相似度计算方法。为更有效的发现异常的临床行为序列,在比较用药事件相似度时,研究了在药效相似性的基础上加入了药物用量和价格相似性的比较,并根据医疗行为重要性的差异,采用了加权编辑距离算法(WED)计算待检序列与簇质心的相似度。(3)构建异常检测模型。利用PSClu算法对序列聚类生成正常簇,并以这些簇作为正常轮廓;引入WED算法计算待检序列与簇质心的相似度,以待检序列与正常序列簇质心的差异程度作为判断是否存在异常的依据,构建包含数据预处理、聚类生成、相似性判断等机制的异常检测模型。(4)设计并实现了原型系统,基于异常检测模型,对原型系统的页面、服务层、持久层等进行了实现。通过某医疗机构的临床行为数据,对该异常检测系统的性能进行分析和评价。
[Abstract]:At present, the scope of medical insurance services and the face of the continuous expansion of the population, accompanied by medical fraud and irregularities are common. Due to the highly professional nature of medical field and the existence of information asymmetry in medical insurance institutions, medical units and patients, the abnormal behavior of medical care is quite hidden to medical insurance institutions. Therefore, it is of great theoretical and practical value to study the discovery methods of clinical abnormal behavior in order to standardize medical order and to prevent the occurrence of medical fraud. Combined with the domestic and foreign research results, this paper analyzes the characteristics of sequence timing and natural cohesion of clinical behavior sequence, using binary K-means algorithm to cluster the data set of normal clinical behavior sequence, and taking the cluster as the normal contour. According to the characteristics of abnormal behavior of clinical medical events, the distance based anomaly detection technology is studied, and a prototype system of clinical abnormal detection is implemented, which can be used to detect the hidden abnormal behavior in clinical medical data. The main work of this paper is as follows: (1) A binary K-means clustering algorithm based on global similarity matching is proposed. PSClu.PSClu optimizes the distance calculation method of the binary K-means algorithm, and uses the edit distance as the function of similarity measurement between clusters. Combined with the upper and lower bounds of editing distance, the calculation of editing distance of equal length prefix sub-sequence and the approximate solution of cluster centroid, the computation of partial editing distance is filtered to reduce the time complexity of binary K-means algorithm. Finally, a fast clustering method is proposed to calculate the similarity between the test sequence and the cluster centroid. In order to find the sequence of abnormal clinical behaviors more effectively, when comparing the similarity of drug use events, the comparison of drug dosage and price was added on the basis of drug efficacy similarity, and according to the difference of importance of medical behavior, The weighted Editing distance algorithm (WED) is used to calculate the similarity between the detection sequence and the cluster centroid. PSClu algorithm is used to generate normal clusters for sequence clustering, and these clusters are used as normal contours. WED algorithm is introduced to calculate the similarity between sequences and cluster centroids. Based on the difference between the centroid of the cluster to be detected and the normal sequence as the basis for judging the existence of anomalies, a prototype system is designed and implemented, which includes data preprocessing, clustering generation, similarity judgment and other mechanisms. Based on the anomaly detection model, the page, service layer and persistence layer of the prototype system are implemented. Based on the clinical behavior data of a medical institution, the performance of the anomaly detection system is analyzed and evaluated.
【学位授予单位】:江苏大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R197.3;TP311.13
【参考文献】
相关期刊论文 前10条
1 陈兴蜀;吴小松;王文贤;王海舟;;基于特征关联度的K-means初始聚类中心优化算法[J];四川大学学报(工程科学版);2015年01期
2 姜华;韩安琪;王美佳;王峥;吴栻玲;;基于改进编辑距离的字符串相似度求解算法[J];计算机工程;2014年01期
3 肖强;;移动医生工作站的功能和类型探讨[J];中国数字医学;2013年02期
4 谢晴宇;孟庆刚;王永炎;;中医临床路径实践模式的思考[J];北京中医药大学学报;2013年01期
5 郭艳平;颜文俊;;基于EMD和优化K-均值聚类算法诊断滚动轴承故障[J];计算机应用研究;2012年07期
6 彭明强;;临床路径的国内外研究进展[J];中国循证医学杂志;2012年06期
7 钱鹏江;王士同;邓赵红;;基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法[J];自动化学报;2011年02期
8 徐琴珍;杨绿溪;;一种优化的神经网络树异常入侵检测方法[J];信号处理;2010年11期
9 李亚楠;许晟;王斌;;基于加权SimRank的中文查询推荐研究[J];中文信息学报;2010年03期
10 贺玲;蔡益朝;杨征;;高维数据聚类方法综述[J];计算机应用研究;2010年01期
相关博士学位论文 前1条
1 郁继锋;基于数据挖掘的Web应用入侵异常检测研究[D];华中科技大学;2011年
相关硕士学位论文 前4条
1 张传玉;基于序列模式挖掘的临床异常行为检测[D];江苏大学;2016年
2 刘莹霞;链码技术和聚类分析在基因序列中的应用[D];华南理工大学;2012年
3 王珏;序列挖掘在临床行为模式发现中的应用研究[D];江苏大学;2008年
4 庄军;基于临床合理用药的数据挖掘技术和应用研究[D];重庆大学;2006年
,本文编号:1814840
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1814840.html