基于图挖掘的医疗滥用欺诈检测分析

发布时间：2018-01-20 02:54

本文关键词： 医保欺诈医疗滥用医生信任度内部特征和网络图探索非凸标签传播算法　出处：《山东大学》2017年硕士论文　论文类型：学位论文

【摘要】：近年来,健康中国逐步上升为国家战略,医保建设在经济社会发展中占据着重要的地位。随着医疗信息化的不断普及和推进,医保欺诈也越来越被确认为一种严重的社会问题。医疗滥用是医保欺诈中一种主要的欺诈方式,这种欺诈方式主要是指医疗机构或医生提供的药品或者医疗用品与实际治疗所用的不一致或者违背医疗用药标准,从而增加医疗保健支出。各种医疗保险欺诈案件屡见不鲜,大大损害了被保险人的利益,对医保基金的安全造成了重大的损害,严重阻碍了医保政策的实施和推广。尽管,医疗欺诈不是最近发生的一种问题,并且各种欺诈检测方法被提出来解决这个问题,但是医疗欺诈问题仍然没有得到很好的解决。首先,一些基于检测规则的传统检测方法通过专家定义的欺诈和非欺诈规则来找出违规的行为。这些方法往往受限于专家的知识水平。其次,虽然有许多文献提出了各种不同的方法来解决欺诈问题,这些文献中的监督方法专注于将欺诈问题定义为一种二分类问题。医保数据是一种分布很不均衡的数据集,其中包含大量的正常记录以及较少量的欺诈记录,这种偏斜的类分布性使得从大量正常数据中区分出极少量的欺诈数据比较困难。随着时间的推移,医保数据集也根据内部或外部的因素动态变化,从而医保欺诈检测结果不是很理想。最后,监督学习方法为了产生一个更准确欺诈检测结果,需要对训练数据中涉及的大量实体的属性进行分析。这项工作花费了大量的精力和精力,甚者有些属性违反了在医疗领域隐私政策。而基于聚类的离群检测和聚类分析等无监督方法由于输入的参数较少,只需要了解少量的信息,所以获得的结果的准确性往往达不到欺诈检测的要求。因此需要一种涉及较少量非隐私属性、较高准确度的医保欺诈检测方法。本文的具体工作和贡献概括如下:1.提出了一个基于医生信任度的医保欺诈检测方法GM-FP。这个方法通过医生信任度这个关键特征将图挖掘和频繁模式挖掘结合起来,仅仅使用医疗记录来训练一个关于某种疾病的合理治疗模型(药品和医疗设施的种类、数量及之间的关系),并基于未知记录与合理模型的相似程度来判断记录是否存在欺诈。2.提出一种基于医疗记录数据集内部特征和网络图探索的异常检测方法—IF-NE。对于每个医保记录,IF-NE通过分析该记录的内部特征和基于网络的特征,并根据特征选择合适的分类器来对正常记录和异常记录进行分类,从而决定该医保记录是否是欺诈记录。内部特征是基于RMF(新进度、频率和花费金额)来获取的。基于网络的特征提取丰富了医生—病人二分图网络模型,将医疗记录加入形成医生—病人—医保记录三分图模型;同时,利用了一种用于通过网络从有限集合的标记边(即欺诈医保记录)推断所有网络组件(即医生、病人和医保记录)的分数的新算法来获得基于网络特征。最后,利用随机森林基于数据特征对记录进行欺诈检测,结果表明该方法比基准方法效果更好。3.提出一种基于稀有标签传播的欺诈检测方法。该方法改进了传统的基于凸标签传播的标签传播方法,通过凸凹变换,将凸标签传播算法转变为稀有标签传播的非凸标签传播算法,从而解决了标签传播算法在集监督程度低、类不平衡性高的医保数据集上性能降低的问题。
[Abstract]:In recent years, the health China gradually rising to a national strategy, the construction of medical insurance plays an important role in the economic and social development. With the popularization and promotion of medical information, medical insurance fraud is increasingly being recognized as a serious social problem. Medical abuse is a major fraud in Medicare fraud, the fraud mainly refers to the medical institutions or doctors to provide medicines or medical supplies and the actual treatment with inconsistent or contrary to medical standards, thereby increasing medical expenditure. Various medical insurance fraud cases are commonplace, and greatly damage the interests of the insured, caused significant damage to the health insurance fund safety seriously hinder the implementation and promotion of medical insurance policy. However, a problem of medical fraud are not recent, and all kinds of fraud detection method is proposed to solve this problem, But the medical fraud problem is still not solved very well. First of all, some of the traditional detection method based on the detection rules defined by expert fraud and non fraud rules to identify illegal behavior. These methods are often limited by the knowledge of the expert level. Secondly, although there are many literatures put forward various methods to solve the problem of fraud. These monitoring methods in the literature focus on the definition of fraud as a classification problem. Two Medicare data is an uneven distribution of the data set, which contains a large number of normal records and less fraud records, this class makes the distribution of deviation from a large number of normal data to distinguish very small amounts of data fraud difficult. With the passage of time, the health insurance data set according to the dynamic changes of internal and external factors, and medical insurance fraud detection result is not very ideal. At last, Supervised learning methods in order to produce a more accurate result of fraud detection, a large number of attribute entities involved in the training data were analyzed. This work has spent a lot of energy and energy, even some property in violation of privacy in the medical field. While the policy of clustering based outlier detection and clustering analysis method for unsupervised the input parameters are less, only need to know a small amount of information, so the accuracy of the results obtained are often not up to the fraud detection requirements. So we need a less involved non private property, insurance fraud detection method of high accuracy. The main work and contributions are summarized as follows: 1. a medical insurance fraud detection the doctor method based on trust degree by the method of GM-FP. doctors trust this key feature of graph mining and mining frequent patterns together, only the use of medical records Book to train a reasonable treatment of a disease model (the relationship between species, and the number of drugs and medical facilities, and the degree of similarity between) unknown record and reasonable model based on the record to judge the existence of fraud.2. proposed a medical record data set based on the internal characteristics and explore the network anomaly detection method IF-NE. for each medical record, IF-NE through the internal characteristic analysis of the record and based on the characteristics of the network, and according to the characteristics of choosing appropriate classifier to classify the normal and abnormal records records, so as to determine the medical insurance fraud records are recorded. The internal characteristics is based on the RMF (new schedule, frequency and amount spent) to get the network based feature extraction. The doctor - patient rich two networks, will join the doctor - patient medical records form - health records of three figure model; at the same time, The use of a mark used by network from a finite set of edges (i.e. health records fraud) concluded that all of the network components (i.e. doctors, patients and medical records) of the new algorithm to obtain scores based on network characteristics. Finally, using the random forest based on data characteristics on the record of fraud detection, the results show that the method is higher than the standard method better.3. presents a rare fraud detection method based on label propagation. This method improved label propagation method based on the traditional convex label propagation, through the convex concave convex transform, label propagation algorithm transformation of non convex label propagation algorithm for rare label propagation, thus solving the label propagation algorithm in the low level of supervision, problems the performance of the high class imbalance reduce Medicare data set.

【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：R197.1;TP311.13

【相似文献】