面向刑事案件的精细分类与串并案分析技术研究
[Abstract]:With the rapid development of information technology, the information system in the field of public security is also faced with a huge amount of data, mainly text data, the traditional manual processing method has been difficult to meet the needs of the business. More automatic and intelligent text mining technology must be adopted to improve the efficiency of case handling. Focusing on the text of criminal cases, this paper focuses on the fine classification of cases and the analysis of serial cases, which are generally concerned by criminal investigators. A two-level classification method, TLC-NBK, based on naive Bayes and cooccurrence map of keywords is proposed. The method is based on the characteristics of short text length, low word frequency, hierarchical and unbalanced distribution of categories. Firstly, based on the DF method of document frequency, part of speech feature is introduced, and a two-factor evaluation algorithm is proposed for feature selection, and then naive Bayesian classification is carried out by using the multi-variable Bernoulli model oriented to unbalanced categories. On the basis of the first level classifier, the cooccurrence vector of keywords based on the document set is constructed for the second class case category to which it belongs. The cooccurrence relation between keywords is used instead of the word frequency to calculate the weight, and the inverse class frequency factor is proposed to modify the co-occurrence weight. Finally, the simple vector distance algorithm is used to realize the fine classification of the second-level case category. In addition, the interference of domain synonyms to classification results is eliminated by using synonym net technology. A density clustering method based on case features is proposed, and the serial case sequence analysis is realized. The method firstly extracts the structured case features from the unstructured case description information by combining rules and dictionaries, and then defines the formula for calculating the similarity of features between the case texts, and considers the fine case categories synthetically. The influence of time and location on the similarity of case features is analyzed, and the weight of each dimension is determined by AHP. Finally, the idea of OPTICS, a classical density clustering algorithm, is used for reference. The feature density clustering algorithm (OPTICS-FD,) is proposed to analyze the cluster of cases effectively and to assist the criminal investigators to solve the cases. Finally, the double factor evaluation algorithm, two-level classifier, case feature extraction and string-parallel case clustering are tested through experiments. The results show that in the field of criminal case text mining, the accuracy and recall rate of TLC-NBK method are increased by 7.53% and 12.99%, respectively, and the reduction rate and recall rate of OPTICS-FD algorithm are 66.52% and 91.25%, respectively.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1;D918.2
【参考文献】
相关期刊论文 前10条
1 吴文浩;吴升;;多时间尺度密度聚类算法的案事件分析应用[J];地球信息科学学报;2015年07期
2 陈龙;Neil Stuart;Williams A.Mackaness;;美国内布拉斯加州林肯市犯罪行为的聚类及热点分布分析[J];测绘与空间地理信息;2015年03期
3 卢睿;;刑事案件的属性约简聚类算法研究[J];中国人民公安大学学报(自然科学版);2015年01期
4 苏光大;田青;徐伟;邓宇;;人脸识别技术及其在公共安全领域的应用[J];警察技术;2014年05期
5 周志涛;鲍灵佳;;社会网络分析在团伙诈骗犯罪侦查中的应用[J];江西警察学院学报;2014年03期
6 陈俊杰;候宏旭;高静;;一种KeyGraph的建模方法[J];中北大学学报(自然科学版);2014年02期
7 李为;;基于数据挖掘技术的网络违法案件分析研究[J];现代计算机(专业版);2013年35期
8 杨静;王靖;;基于聚类分析检索团伙多起犯罪的迭代算法[J];计算机与现代化;2013年01期
9 高建强;谭剑;崔永发;;一种基于通讯痕迹的社会网络团伙分析模型[J];计算机应用与软件;2012年03期
10 杨凯峰;张毅坤;李燕;;基于文档频率的特征选择方法[J];计算机工程;2010年17期
相关硕士学位论文 前3条
1 韩彦斌;基于人脸检测和特征提取的移动人像采集系统[D];云南大学;2015年
2 金鑫;基于文本机会发现的共识与非共识标签区分方法[D];东北大学;2011年
3 程春惠;公安犯罪案件文本挖掘关键技术研究[D];浙江大学;2010年
,本文编号:2231130
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2231130.html