基于SVM模型的恶意网页及PDF文档检测技术研究

发布时间：2019-04-11 15:02

【摘要】：互联网在给人们提供更加方便、快捷的信息化服务的同时，也由于其开放性和脆弱性给黑客攻击打开了便利之门。当前，在众多的网络攻击中，最流行的攻击方式是将脚本元素作为攻击代码的载体，利用浏览器及其插件中的漏洞，在客户端隐蔽下载并执行恶意程序，进而对用户实施恶意攻击。这种典型的网页木马攻击方式已经对互联网的安全构成严重威胁。传统基于静态特征码的反病毒引擎主要采用匹配法来检测网页木马，这种方法的局限性在于无法检测出经过混淆的恶意代码，并且静态特征库也会随着时间的推移变得异常庞大，最终导致检测性能下降。因此，有必要研究一种能够在不依赖静态特征库的情况下，实现对恶意混淆代码的快速检测技术。此外，随着PDF文档的广泛应用，以及PDF阅读软件存在的诸多漏洞，使得PDF文档也逐渐成为网页木马的传播载体。因此，设计一种能够同时检测恶意Web页面和恶意PDF文档的混合样本检测引擎具有广阔的市场前景。基于以上出发点，本论文通过对Web样本和PDF样本的结构进行分析，，然后利用基于统计学习理论的支持向量机技术和基于动态运行的shellcode仿真技术，实现了一种能够快速检测出隐藏在Web网页或PDF文档中的恶意代码的检测引擎。论文的主要工作如下： (1)对网页木马的攻击与防御技术进行全面归纳总结。阐述了网页木马的基本攻击原理和攻击手段；分析了针对不同环节（如：网站服务器端、中间代理端、客户端）的防御技术及其优缺点。 (2)采用支持向量机技术来检测混淆的恶意网页代码，克服了传统基于静态特征码检测技术的缺陷。通过对待测样本的结构进行分析并提取其中的JS代码，并利用支持向量机技术对大量JS特征字符进行训练，获得一个可以区分恶意样本和正常样本的特征分类器，从而实现对恶意混淆代码的快速检测（分类）。 (3)通过对PDF文档结构中的流对象进行静态分析，提取其中嵌套的JS代码，再利用支持向量机检测技术对JS代码检测，从而实现了对恶意PDF文档的检测。 (4)使用一种动态模拟工具对恶意脚本中的Shellcode代码进行运行仿真，可以得到恶意代码的详细行为分析报告，从而有助于分析人员对其进行直观、细致的分析。
[Abstract]:Internet not only provides people with more convenient and fast information service, but also opens the door to hacker attack because of its openness and fragility. Currently, among many network attacks, the most popular attack is to use script elements as the carrier of attack code, exploit the vulnerability in browser and its plug-in, and secretly download and execute malicious programs on the client side. And then carry out malicious attacks on the user. This typical web Trojan attack has posed a serious threat to the security of the Internet. The traditional anti-virus engine based on static signature mainly uses matching method to detect web page Trojan horse. The limitation of this method is that it can't detect the confused malicious code. And the static feature library will also become unusually large over time, resulting in a decline in detection performance. Therefore, it is necessary to study a fast detection technique for malicious obfuscation code without relying on static feature library. In addition, with the extensive application of PDF documents and many vulnerabilities in PDF reading software, PDF documents have gradually become the carrier of web Trojan horse. Therefore, the design of a hybrid sample detection engine which can detect malicious Web pages and malicious PDF documents simultaneously has a broad market prospect. Based on the above, this paper analyzes the structure of Web samples and PDF samples, and then makes use of the support vector machine technology based on statistical learning theory and the shellcode simulation technology based on dynamic operation. A fast detection engine for detecting malicious code hidden in Web web pages or PDF documents is implemented. The main work of this paper is as follows: (1) summarize the attack and defense technology of webpage Trojan horse. This paper expounds the basic attack principle and attack means of the web page Trojan horse and analyzes the defense technology and its advantages and disadvantages aimed at different links (such as web server intermediate agent client). (2) support vector machine (SVM) is used to detect confused malicious web page code, which overcomes the shortcomings of traditional static signature detection technology. By analyzing the structure of test samples and extracting the JS code, a large number of JS feature characters are trained by support vector machine (SVM), and a feature classifier which can distinguish malicious samples from normal samples is obtained. Thus, the fast detection (classification) of malicious obfuscation codes can be realized. (3) through the static analysis of stream objects in PDF document structure, the nested JS code is extracted, and then the JS code is detected by support vector machine (SVM), thus the detection of malicious PDF documents is realized. (4) using a dynamic simulation tool to run the Shellcode code in malicious script, the detailed behavior analysis report of malicious code can be obtained, which is helpful for analysts to analyze the malicious code intuitively and meticulously.
【学位授予单位】：江西理工大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP393.092;TP393.08

【参考文献】