针对非平衡警情数据改进的K-Means-Boosting-BP模型
发布时间:2018-04-10 23:23
本文选题:非平衡数据 + Synthetic ; 参考:《中国图象图形学报》2017年09期
【摘要】:目的掌握警情的时空分布规律,通过机器学习算法建立警情时空预测模型,制定科学的警务防控方案,有效抑制犯罪的发生,是犯罪地理研究的重点。已有研究表明,警情时空分布多集中在中心城区或居民密集区,在时空上属于非平衡数据,这种数据的非平衡性通常导致在该数据上训练的模型成为弱学习器,预测精度较低。为解决这种非平衡数据的回归问题,提出一种基于KMeans均值聚类的Boosting算法。方法该算法以Boosting集成学习算法为基础,应用GA-BP神经网络生成基分类器,借助KMeans均值聚类算法进行基分类器的集成,从而实现将弱学习器提升为强学习器的目标。结果与常用的解决非平衡数据回归问题的Synthetic Minority Oversampling Technique Boosting算法,简称SMOTEBoosting算法相比,该算法具有两方面的优势:1)在降低非平衡数据中少数类均方误差的同时也降低了数据的整体均方误差,SMOTEBoosting算法的整体均方误差为2.14E-04,KMeans-Boosting算法的整体均方误差达到9.85E-05;2)更好地平衡了少数类样本识别的准确率和召回率,KMeans-Boosting算法的召回率约等于52%,SMOTEBoosting算法的召回率约等于91%;但KMeans-Boosting算法的准确率等于85%,远高于SMOTEBoosting算法的19%。结论 KMeans-Boosting算法能够显著的降低非平衡数据的整体均方误差,提高少数类样本识别的准确率和召回率,是一种有效地解决非平衡数据回归问题和分类问题的算法,可以推广至其他需要处理非平衡数据的领域中。
[Abstract]:Objective to master the temporal and spatial distribution of police information, to establish a spatio-temporal prediction model of police information through machine learning algorithm, to formulate a scientific police prevention and control scheme, and to effectively suppress the occurrence of crime, which is the focus of crime geography research.It has been shown that the temporal and spatial distribution of police information is mostly concentrated in the central urban area or densely populated area, and belongs to the non-equilibrium data in time and space. The non-equilibrium of this kind of data usually leads to the model trained on the data become a weak learner, and the prediction accuracy is low.In order to solve the regression problem of unbalanced data, a Boosting algorithm based on KMeans mean clustering is proposed.Methods based on Boosting ensemble learning algorithm, GA-BP neural network is used to generate base classifier, and KMeans mean clustering algorithm is used to realize the ensemble of base classifier.Results compared with the commonly used Synthetic Minority Oversampling Technique Boosting (SMOTEBoosting) algorithm for solving the non-equilibrium data regression problem,The accuracy of recognition of a few samples is balanced with the recall rate of KMeans-Boosting algorithm. The recall rate of KMeans-Boosting algorithm is about 522. The recall rate of SMOTEBoosting algorithm is about 91, but the accuracy rate of KMeans-Boosting algorithm is 85, which is much higher than that of SMOTEBoosting algorithm.Conclusion KMeans-Boosting algorithm can significantly reduce the global mean square error of non-equilibrium data and improve the accuracy and recall rate of a small number of samples. It is an effective algorithm to solve the problem of regression and classification of unbalanced data.It can be extended to other areas that need to deal with unbalanced data.
【作者单位】: 华南师范大学;广东精一规划信息科技股份有限公司;
【基金】:公安部科技强警基础工作专项项目(2016GABJC47)
【分类号】:D035.3;TP311.13
,
本文编号:1733403
本文链接:https://www.wllwen.com/falvlunwen/fanzuizhian/1733403.html