基于编辑距离的序列聚类算法及其在临床异常检测中的应用

发布时间：2018-04-28 10:15

本文选题：临床行为序列 + 序列聚类　；参考：《江苏大学》2017年硕士论文

【摘要】：目前,医疗保险的服务范围和面对的人群不断扩大,伴随而生的医疗欺诈及违规行为屡见不鲜。由于医疗领域高度的专业性以及医保交易的三方(医保机构、医疗单位、患者)存在着信息不对称,医疗异常行为对医保机构而言具有相当的隐蔽性。为此,研究临床异常行为的发现方法,对规范医疗秩序、预防医疗欺诈行为的发生具有重要的理论研究和实际应用价值。结合国内外的研究成果,本文分析了临床行为序列的时序性和自然凝聚性的特点,利用二分K均值算法对正常临床行为序列数据集进行聚类,将得到的簇作为正常轮廓,针对临床医疗事件异常行为的特征,研究了基于距离的异常检测技术,实现了临床异常检测原型系统,用以发现临床医疗数据中潜藏的异常行为。本文的主要工作如下:(1)提出基于整体相似度匹配的二分K均值序列聚类算法(PSClu)。PSClu对二分K均值算法的距离计算方式进行优化,采用编辑距离作为簇间序列相似性度量的函数,结合编辑距离的上下界、等长前缀子序列的编辑距离计算以及簇的质心近似求解方法,过滤部分编辑距离的计算,降低二分K均值算法的时间复杂度,最终快速聚类生成正常临床行为序列的簇。(2)提出待检序列到簇质心的相似度计算方法。为更有效的发现异常的临床行为序列,在比较用药事件相似度时,研究了在药效相似性的基础上加入了药物用量和价格相似性的比较,并根据医疗行为重要性的差异,采用了加权编辑距离算法(WED)计算待检序列与簇质心的相似度。(3)构建异常检测模型。利用PSClu算法对序列聚类生成正常簇,并以这些簇作为正常轮廓;引入WED算法计算待检序列与簇质心的相似度,以待检序列与正常序列簇质心的差异程度作为判断是否存在异常的依据,构建包含数据预处理、聚类生成、相似性判断等机制的异常检测模型。(4)设计并实现了原型系统,基于异常检测模型,对原型系统的页面、服务层、持久层等进行了实现。通过某医疗机构的临床行为数据,对该异常检测系统的性能进行分析和评价。
[Abstract]:At present, the scope of medical insurance services and the face of the continuous expansion of the population, accompanied by medical fraud and irregularities are common. Due to the highly professional nature of medical field and the existence of information asymmetry in medical insurance institutions, medical units and patients, the abnormal behavior of medical care is quite hidden to medical insurance institutions. Therefore, it is of great theoretical and practical value to study the discovery methods of clinical abnormal behavior in order to standardize medical order and to prevent the occurrence of medical fraud. Combined with the domestic and foreign research results, this paper analyzes the characteristics of sequence timing and natural cohesion of clinical behavior sequence, using binary K-means algorithm to cluster the data set of normal clinical behavior sequence, and taking the cluster as the normal contour. According to the characteristics of abnormal behavior of clinical medical events, the distance based anomaly detection technology is studied, and a prototype system of clinical abnormal detection is implemented, which can be used to detect the hidden abnormal behavior in clinical medical data. The main work of this paper is as follows: (1) A binary K-means clustering algorithm based on global similarity matching is proposed. PSClu.PSClu optimizes the distance calculation method of the binary K-means algorithm, and uses the edit distance as the function of similarity measurement between clusters. Combined with the upper and lower bounds of editing distance, the calculation of editing distance of equal length prefix sub-sequence and the approximate solution of cluster centroid, the computation of partial editing distance is filtered to reduce the time complexity of binary K-means algorithm. Finally, a fast clustering method is proposed to calculate the similarity between the test sequence and the cluster centroid. In order to find the sequence of abnormal clinical behaviors more effectively, when comparing the similarity of drug use events, the comparison of drug dosage and price was added on the basis of drug efficacy similarity, and according to the difference of importance of medical behavior, The weighted Editing distance algorithm (WED) is used to calculate the similarity between the detection sequence and the cluster centroid. PSClu algorithm is used to generate normal clusters for sequence clustering, and these clusters are used as normal contours. WED algorithm is introduced to calculate the similarity between sequences and cluster centroids. Based on the difference between the centroid of the cluster to be detected and the normal sequence as the basis for judging the existence of anomalies, a prototype system is designed and implemented, which includes data preprocessing, clustering generation, similarity judgment and other mechanisms. Based on the anomaly detection model, the page, service layer and persistence layer of the prototype system are implemented. Based on the clinical behavior data of a medical institution, the performance of the anomaly detection system is analyzed and evaluated.
【学位授予单位】：江苏大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：R197.3;TP311.13

【参考文献】