当前位置:主页 > 科技论文 > 软件论文 >

面向刑事案件的精细分类与串并案分析技术研究

发布时间:2018-09-08 17:00
【摘要】:随着信息技术的高速发展,公安领域的情报信息系统也面临着海量数据,主要是文本数据带来的巨大挑战,传统的手工处理方式已经难以满足业务上的需求,必须采用更加自动化、智能化的文本挖掘技术来提高办案效率。面向刑事案件文本,重点研究案件精细分类和串并案分析这两个刑侦人员普遍关注的问题。提出了基于朴素贝叶斯和关键词共现图谱的两级分类方法TLC-NBK,该方法根据案件文本长度短、词频低、类别分布具有层次性和不均衡性的特点,首先在文档频率DF方法的基础上引入了词性特征,提出双因子评估算法进行特征选择,然后利用面向不均衡类别的多变量贝努利模型进行朴素贝叶斯分类,实现了一级案件类别的快速、准确划分;在第一级分类器的基础上,针对其所属的二级案件类别分别构建以文档集为基本单位的关键词共现向量,以关键词间的共现关系代替词频计算权重,并提出了逆类别频率因子对共现权重进行修正,最后采用简单向量距离算法实现二级案件类别的精细分类。此外,还利用同义词网技术消除了领域同义词对分类结果的干扰。提出了基于案件特征的密度聚类方法,实现了系列案件的串并分析。该方法首先结合规则和字典从非结构化的案情描述信息中抽取出结构化的案件特征;接着定义了案件文本间的特征相似度计算公式,综合考虑了精细案件类别、案发时间和案发地点对案件特征相似度的影响,并采用层次分析法决策各维度的权重值;最后,借鉴经典密度聚类算法OPTICS的思想,提出了特征密度聚类算法OPTICS-FD,能够有效的分析出系列案件的密集簇,辅助刑侦人员破案。最后,通过实验对双因子评估算法、两级分类器、案件特征抽取和串并案聚类进行了测试。结果表明,在刑事案件文本挖掘领域,相比于传统方法,TLC-NBK方法的准确率和召回率分别提升了7.53%和12.99%;OPTICS-FD算法的缩减率与召回率分别达到了66.52%和91.25%,更好的支持了刑侦人员进行决策。
[Abstract]:With the rapid development of information technology, the information system in the field of public security is also faced with a huge amount of data, mainly text data, the traditional manual processing method has been difficult to meet the needs of the business. More automatic and intelligent text mining technology must be adopted to improve the efficiency of case handling. Focusing on the text of criminal cases, this paper focuses on the fine classification of cases and the analysis of serial cases, which are generally concerned by criminal investigators. A two-level classification method, TLC-NBK, based on naive Bayes and cooccurrence map of keywords is proposed. The method is based on the characteristics of short text length, low word frequency, hierarchical and unbalanced distribution of categories. Firstly, based on the DF method of document frequency, part of speech feature is introduced, and a two-factor evaluation algorithm is proposed for feature selection, and then naive Bayesian classification is carried out by using the multi-variable Bernoulli model oriented to unbalanced categories. On the basis of the first level classifier, the cooccurrence vector of keywords based on the document set is constructed for the second class case category to which it belongs. The cooccurrence relation between keywords is used instead of the word frequency to calculate the weight, and the inverse class frequency factor is proposed to modify the co-occurrence weight. Finally, the simple vector distance algorithm is used to realize the fine classification of the second-level case category. In addition, the interference of domain synonyms to classification results is eliminated by using synonym net technology. A density clustering method based on case features is proposed, and the serial case sequence analysis is realized. The method firstly extracts the structured case features from the unstructured case description information by combining rules and dictionaries, and then defines the formula for calculating the similarity of features between the case texts, and considers the fine case categories synthetically. The influence of time and location on the similarity of case features is analyzed, and the weight of each dimension is determined by AHP. Finally, the idea of OPTICS, a classical density clustering algorithm, is used for reference. The feature density clustering algorithm (OPTICS-FD,) is proposed to analyze the cluster of cases effectively and to assist the criminal investigators to solve the cases. Finally, the double factor evaluation algorithm, two-level classifier, case feature extraction and string-parallel case clustering are tested through experiments. The results show that in the field of criminal case text mining, the accuracy and recall rate of TLC-NBK method are increased by 7.53% and 12.99%, respectively, and the reduction rate and recall rate of OPTICS-FD algorithm are 66.52% and 91.25%, respectively.
【学位授予单位】:华中科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1;D918.2

【参考文献】

相关期刊论文 前10条

1 吴文浩;吴升;;多时间尺度密度聚类算法的案事件分析应用[J];地球信息科学学报;2015年07期

2 陈龙;Neil Stuart;Williams A.Mackaness;;美国内布拉斯加州林肯市犯罪行为的聚类及热点分布分析[J];测绘与空间地理信息;2015年03期

3 卢睿;;刑事案件的属性约简聚类算法研究[J];中国人民公安大学学报(自然科学版);2015年01期

4 苏光大;田青;徐伟;邓宇;;人脸识别技术及其在公共安全领域的应用[J];警察技术;2014年05期

5 周志涛;鲍灵佳;;社会网络分析在团伙诈骗犯罪侦查中的应用[J];江西警察学院学报;2014年03期

6 陈俊杰;候宏旭;高静;;一种KeyGraph的建模方法[J];中北大学学报(自然科学版);2014年02期

7 李为;;基于数据挖掘技术的网络违法案件分析研究[J];现代计算机(专业版);2013年35期

8 杨静;王靖;;基于聚类分析检索团伙多起犯罪的迭代算法[J];计算机与现代化;2013年01期

9 高建强;谭剑;崔永发;;一种基于通讯痕迹的社会网络团伙分析模型[J];计算机应用与软件;2012年03期

10 杨凯峰;张毅坤;李燕;;基于文档频率的特征选择方法[J];计算机工程;2010年17期

相关硕士学位论文 前3条

1 韩彦斌;基于人脸检测和特征提取的移动人像采集系统[D];云南大学;2015年

2 金鑫;基于文本机会发现的共识与非共识标签区分方法[D];东北大学;2011年

3 程春惠;公安犯罪案件文本挖掘关键技术研究[D];浙江大学;2010年



本文编号:2231130

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2231130.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户d5178***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com