用于入侵取证的大规模取证日志自动简化技术研究
发布时间:2018-10-05 13:40
【摘要】:随着信息技术的迅速发展,计算机犯罪问题(如黑客入侵)逐渐成为不容忽视的不安定因素,直接影响了国家的政治、经济、文化和其他各个领域的正常秩序。在当前形势下,针对入侵取证技术的研究对于打击计算机犯罪以及增强计算机网络安全而言十分重要。各类的日志数据是入侵取证分析的重要候选证据来源,有效地记录了计算机入侵时产生的诸多用户行为以及入侵检测系统本身的行为。但是,现有的各类日志数据用于取证分析时仍然存在着不少的问题。其中比较突出的问题就是日志数据集的规模过于庞大,每周的数据量可达数十万甚至几百万条记录,这必然使得有用信息(例如攻击相关的事件)湮没在大量正常系统行为触发的无用或冗余事件之中,为入侵取证分析增加了难度。本文提出了一种基于信息论和属性权重的并行取证日志自动删减方法,其工作原理是基于Hadoop开源框架,使用MapReduce模型对多属性进行垂直划分,每个属性子集并行处理。针对每一个属性子集,使用互信息和熵权值这两个度量标准来考察当前属性与其他属性的相关性。筛选出熵权值较大而互信息值较小的属性一定是独立的。此时将熵权值作为权重对选取的各个属性进行加权,获取一个Score值,对Score值排序,设置阈值,对需要删除的冗余日志记录作为中间结果。最后,运用专门设计的函数对剩余日志记录进行二次简化,获得需要删除的冗余日志记录。通过对几个Windows平台和Linux平台上具有代表性的数据集进行实验,结果表明该方法快速高效、不需要任何先验知识、人工干预较少并且适用于大规模数据。
[Abstract]:With the rapid development of information technology, the problem of computer crime (such as hacking intrusion) has gradually become an unstable factor that can not be ignored, and has directly affected the normal order of the country in politics, economy, culture and other fields. In the current situation, the research of intrusion forensics is very important to combat computer crime and enhance the security of computer network. All kinds of log data are important candidate evidence sources for intrusion forensics analysis, which effectively record many user behaviors and the behavior of intrusion detection system (IDS). However, there are still many problems when the existing log data are used for forensic analysis. One of the more prominent problems is that the scale of the log data set is too large, the amount of data per week can reach hundreds of thousands or even millions of records. This inevitably causes useful information (such as attacks related events) to be annihilated in a large number of useless or redundant events triggered by normal system behavior, which makes it more difficult for intrusion forensics analysis. In this paper, a parallel forensics log automatic deletion method based on information theory and attribute weight is proposed. Its working principle is based on Hadoop open source framework, MapReduce model is used to divide multiple attributes vertically, and each attribute subset is processed in parallel. For each subset of attributes, mutual information and entropy weights are used to evaluate the correlation between the current attributes and other attributes. The attributes with larger entropy weight and smaller mutual information value must be independent. In this case, entropy weight is used as weight to weight the selected attributes, get a Score value, sort the Score value, set a threshold, and take redundant log records that need to be deleted as the intermediate results. Finally, the residual log records are simplified twice by using specially designed functions, and the redundant log records that need to be deleted are obtained. Experiments on several representative data sets on Windows and Linux platforms show that the proposed method is fast and efficient, does not require any prior knowledge, has less manual intervention and is suitable for large scale data.
【学位授予单位】:南京大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
本文编号:2253604
[Abstract]:With the rapid development of information technology, the problem of computer crime (such as hacking intrusion) has gradually become an unstable factor that can not be ignored, and has directly affected the normal order of the country in politics, economy, culture and other fields. In the current situation, the research of intrusion forensics is very important to combat computer crime and enhance the security of computer network. All kinds of log data are important candidate evidence sources for intrusion forensics analysis, which effectively record many user behaviors and the behavior of intrusion detection system (IDS). However, there are still many problems when the existing log data are used for forensic analysis. One of the more prominent problems is that the scale of the log data set is too large, the amount of data per week can reach hundreds of thousands or even millions of records. This inevitably causes useful information (such as attacks related events) to be annihilated in a large number of useless or redundant events triggered by normal system behavior, which makes it more difficult for intrusion forensics analysis. In this paper, a parallel forensics log automatic deletion method based on information theory and attribute weight is proposed. Its working principle is based on Hadoop open source framework, MapReduce model is used to divide multiple attributes vertically, and each attribute subset is processed in parallel. For each subset of attributes, mutual information and entropy weights are used to evaluate the correlation between the current attributes and other attributes. The attributes with larger entropy weight and smaller mutual information value must be independent. In this case, entropy weight is used as weight to weight the selected attributes, get a Score value, sort the Score value, set a threshold, and take redundant log records that need to be deleted as the intermediate results. Finally, the residual log records are simplified twice by using specially designed functions, and the redundant log records that need to be deleted are obtained. Experiments on several representative data sets on Windows and Linux platforms show that the proposed method is fast and efficient, does not require any prior knowledge, has less manual intervention and is suitable for large scale data.
【学位授予单位】:南京大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
【参考文献】
相关期刊论文 前9条
1 程苗;;基于云计算的Web数据挖掘[J];计算机科学;2011年S1期
2 张净;孙志挥;宋余庆;倪巍伟;晏燕华;;基于信息论的高维海量数据离群点挖掘[J];计算机科学;2011年07期
3 王之元;杨学军;;并行计算系统度量指标综述[J];计算机工程与科学;2010年10期
4 史伟奇;张波云;谢冬青;;基于远程控制技术的动态取证系统[J];计算机工程;2007年16期
5 高献伟;郑捷文;杨泽明;许榕生;;智能网络取证系统[J];计算机仿真;2006年03期
6 郭新涛,梁敏,阮备军,朱扬勇;挖掘Web日志降低信息搜寻的时间费用[J];计算机研究与发展;2004年10期
7 庄力可;张长水;勒中坚;;基于时间密度的Web日志用户浏览行为分析[J];计算机科学;2004年04期
8 王玲,钱华林;计算机取证技术及其发展趋势[J];软件学报;2003年09期
9 孙安香,宋君强,伍湘君;并行计算的数据分配[J];计算机工程与科学;1997年02期
相关硕士学位论文 前1条
1 段超;基于多属性的空间离群点检测算法研究[D];华东理工大学;2013年
,本文编号:2253604
本文链接:https://www.wllwen.com/jingjilunwen/zhengzhijingjixuelunwen/2253604.html