基于数据挖掘和机器学习的木马检测系统设计与实现

发布时间：2018-03-16 12:02

本文选题：网页木马　切入点：JavaScript　出处：《电子科技大学》2014年硕士论文　论文类型：学位论文

【摘要】：计算机网络正在改变着人们的生活方式,但由于网络存在开放性、互联性等特征,致使网络容易导致不法分子的攻击,这使得网络安全吸引着越来越多人的关注。其中,网页木马已经称为网络安全的头号杀手,病毒传播、非法入侵、服务器瘫痪等安全问题都是以木马为载体所引起的。传统的基于模式匹配的检测方法是当前安全检测系统使用最多的方法,它主要依赖于人工分析提取,不能够预测未知的恶意代码,对于混淆或变形的恶意代码却无能为力。数据挖掘和机器学习是当前计算机热门研究领域,结合这两种技术对网页木马进行检测是未来的研究发展趋势。本文正是基于以上问题,在深入分析了数据挖掘和机器学习的原理基础上,设计并实现了针对恶意JavaScript脚本的网页木马检测系统。论文的主要工作内容包括:1.首先,介绍了数据挖掘和机器学习技术的主要原理和理论知识;然后概括了目前国内外已经出现的网页木马的主流检测算法,并分析了各算法具有的优缺点。2.按照软件工程的原理与思想,分析木马检测系统的主要功能需求、总体框架、工作流程等。最后,采用VC++6.0 MFC、mysql等工具与技术设计并实现了网页木马检测的原型系统。该系统主要包括了URL黑名单、网络爬虫、特征提取、BP集成神经网络分类器等功能子模块。3.目前,大部分网页木马都会在页面中嵌入恶意JavaScript脚本代码。因此本文重点针对基于恶意JavaScript脚本的网页木马进行检测研究。为逃避防病毒软件的检测,恶意的JS代码往往经过混淆或变形,常规的特征匹配检测技术对混淆网页木马检测基本无效。本文利用Google V8 JavaScript脚本引擎编译恶意JS脚本生成机器码,从机器指令中提取出操作码后再进行基于字N-gram的出现频率统计,以出现最为频繁的200个gram作为区别正常脚本和恶意脚本的网页木马特征。4.本文使用网络爬虫等工具从互联网上收集100个正常JS脚本和100个恶意JS脚本作为网页木马样本集合。然后利用这200个样本数据集合进行BP神经网络集成分类器模型的训练,使用4-重交叉验证方法分析了该检测方法的准确率和正确率,当分类器达到一定的准确度之后将训练得到的分类器模型应用到网页木马检测系统。最后,还对系统的功能性和健壮性进行了测试。
[Abstract]:The computer network is changing people's way of life, but because the network has the characteristics of openness, interconnection and so on, the network is easy to lead to the attack of lawless elements, which makes the network security attract more and more people's attention. Web Trojan has been known as the number one killer of network security, virus spread, illegal intrusion, Security problems such as server paralysis are caused by Trojan horse. Traditional detection method based on pattern matching is the most used method in current security detection system, which mainly relies on manual analysis and extraction. Can not predict unknown malicious code, but can not be confused or distorted malicious code. Data mining and machine learning is a hot area of computer research. It is the trend of future research and development to combine these two technologies to detect web Trojan horse. Based on the above problems, this paper deeply analyzes the principles of data mining and machine learning. A web Trojan detection system for malicious JavaScript script is designed and implemented. The main work of this paper includes: 1. Firstly, the main principles and theoretical knowledge of data mining and machine learning technology are introduced. Then it summarizes the main detection algorithms of the web Trojan that have appeared at home and abroad, and analyzes the advantages and disadvantages of the algorithms. 2. According to the principle and thought of software engineering, the paper analyzes the main functional requirements and the overall framework of the Trojan detection system. Finally, the prototype system of web Trojan detection is designed and implemented by using VC 6.0 MFCU MySQL and other tools and techniques. The system mainly includes URL blacklist, web crawler, web crawler, etc. Feature extraction BP integrated neural network classifier and other functional submodules. 3. At present, Most web Trojan horses will embed malicious JavaScript script code in the page. Therefore, this paper focuses on the detection of web Trojan based on malicious JavaScript scripts. The malicious JS code is often confused or deformed, and the conventional feature matching detection technique is not effective for the detection of the obfuscation page Trojan horse. This paper uses Google V8 JavaScript script engine to compile the malicious JS script to generate machine code. After extracting the operation code from the machine instruction, the occurrence frequency statistics based on the word N-gram are carried out. This paper uses web crawler and other tools to collect 100 normal JS scripts and 100 malicious JS scripts from the Internet. Then the 200 sample data sets are used to train the BP neural network ensemble classifier model. The accuracy and accuracy of the method are analyzed by using 4- re-cross verification method. After the classifier reaches a certain accuracy, the trained classifier model is applied to the web Trojan detection system. The functionality and robustness of the system are also tested.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.08
，

本文编号：1619826

资料下载

论文发表

支付宝下载

Download by Alipay
微信下载

Download by Wechat
会员下载

Download by Member

本文链接：https://www.wllwen.com/guanlilunwen/ydhl/1619826.html

上一篇：僵尸网络分析实验设计
下一篇：基于社会计算的IM恶意代码防御机制

论文发表

·知网|万方|维普|龙源|省级|国家级|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|