基于聚类的串并案分析研究与实现

发布时间：2018-04-15 22:00

本文选题：串并案分析 + 案件特征　；参考：《东华大学》2017年硕士论文

【摘要】：随着社会经济的不断发展,人民总体生活水平的不断提高,我国犯罪职业化、团伙化、流窜化的趋势日益凸显,各种犯罪日益增多,其中多发性侵财案件增幅较大,系列案件占有较大的比重。对若干有内在关联的系列案件展开侦查,利用公安部门多年来积累的犯罪信息数据及侦察破案的经验,对其进行分析挖掘,发现犯罪行为的规律、趋势,了解案件之间的关联,进行串并案分析是当前公安机关分析人员的主要任务。本文对公安部门提供的长宁区近年来部分侵财类犯罪案件数据进行分析,确定哪些案件属于同一系列案。首先观察各案件特征描述,确定案件信息抽取范围,为信息抽取工作做好数据准备工作。将案情描述文本中涉及的案件特征进行抽取,将所有案件特征数据转换后进行聚类计算。再通过聚类结果,将分至同一簇的案件视为系列案件,对其进行串并案分析。如何通过案件的文字描述信息将不完善的案件特征进行提取补全及如何通过案件特征进行模糊聚类计算是本文研究的重点。论文最后结合实际需求,设计并实现了一个串并案分析系统。系统主要分为三个模块:案件特征提取模块、聚类分析模块及展示模块。案件特征提取模块主要用以抽取非结构化数据中的特征描述,结合中文分词技术及CRF条件随机场模型对文本进行语料标注,再通过模式匹配将案件特征提取出来。聚类分析模块则将各类案件特征数据进行预处理,通过FCM算法,选取合理的分类数及权重系数,对数据进行聚类计算。展示模块则结合JavaEE及ECharts搭建Web展示页面,将串并案分析结果以文字、图表形式进行展示。最后,通过多次测试,验证了所设计的串并案分析系统能够在一定程度上的满足此次研究实验的需求,为公安部门分析人员提供更直观、可靠的案件特征关联关系,具有一定的应用价值。
[Abstract]:With the continuous development of social economy and the continuous improvement of the people's overall living standard, the trend of crime professionalization, gang and mobility in our country is becoming increasingly prominent, and various kinds of crimes are increasing day by day.A series of cases account for a large proportion.To carry out investigations on a number of cases which are intrinsically related, and to make use of the criminal information data accumulated by the public security departments over the years and their experience in detecting and solving cases, to analyze and excavate them, to find out the laws and trends of criminal acts, and to understand the connection between cases.It is the main task of the current public security organ analysts to analyze the serial and merger cases.This paper analyzes the data of some crimes against property in Changning district provided by the public security department in recent years, and determines which cases belong to the same series of cases.First, observe the description of case characteristics, determine the scope of case information extraction, and prepare the data for information extraction.The case features involved in the case description text are extracted and all case feature data are converted to cluster calculation.Through the clustering results, the cases divided into the same cluster are regarded as a series of cases, and the serial cases are analyzed.How to extract and complete the imperfect case features through the case description information and how to use the case characteristics to carry on the fuzzy clustering calculation is the key point of this paper.In the end of this paper, we design and implement a serial case analysis system.The system is divided into three modules: case feature extraction module, cluster analysis module and display module.Case feature extraction module is mainly used to extract feature description from unstructured data, combining Chinese word segmentation technology and CRF conditional random field model to annotate the text corpus, and then extract the case feature by pattern matching.The cluster analysis module preprocesses all kinds of case feature data, selects the reasonable classification number and weight coefficient through the FCM algorithm, and carries on the clustering calculation to the data.The display module combines JavaEE and ECharts to build the Web display page, and displays the analysis results of the serial and merger cases in the form of text and chart.Finally, through many tests, it is verified that the system can meet the needs of the research experiment to a certain extent, and provide a more intuitive and reliable case characteristic correlation relationship for the public security department analysts.It has certain application value.
【学位授予单位】：东华大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：D631.2;TP311.13

【相似文献】