基于内容的网页恶意代码检测的研究与实现

发布时间：2018-11-26 13:18

【摘要】：近年来,以蠕虫、木马、僵尸网络等为代表的恶意代码始终威胁Internet安全,而随着WEB2.0和云计算的日益普及,越来越多的应用提供基于WEB的服务,已经出现了浏览器级操作系统的趋势,利用浏览器及浏览器插件的漏洞取代了利用操作系统和应用程序漏洞,恶意网页逐渐成为恶意代码传播或攻击的主要渠道,成为地下经济的重要环节。恶意网页是包含恶意内容以使得病毒、木马等可借其进行传播或攻击的网页,包含的恶意内容也被称为网页木马,本质上并非木马,而是以网页为介质进行传播或攻击的恶意代码,一般以JavaScript, VBScript等脚本语言编写,包含在网页之中,通过各种方式进行代码混淆以逃避检测,在网页中插入恶意内容的行为也被称为“网页挂马”。网页恶意代码通过利用用户的浏览器或插件中的漏洞,在用户毫不知情的情况下下载和运行恶意软件,如广告软件、木马和病毒等。正常网页也可能被植入恶意代码,所以即使用户访问一些看似正常的网站,也有可能受到这类恶意代码的攻击。由于网页恶意代码大量使用了代码混淆技术,传统的反病毒软件的漏报率很高,这也导致越来越多的攻击者使用网页恶意代码来传播恶意软件。已有的恶意网页检测方法通常可以分为静态检测(基于网页内容或网址)和动态检测(基于浏览网页引发的行为),以及两者混合的方法。传统静态检测方法简单快速,但只能检测已知的特征,难以处理页面代码混淆,因此会出现大量的漏报和误报,因此,现有系统多使用动态检测的方法,通过在虚拟机中开启一个浏览器来打开网页,监控系统运行状态来找寻恶意行为。动态监测方法准确性较高,但资源消耗比较大,无法用来检测互联网上存在的大规模的网页。通过分析页面内容,提取特征,提出了一种轻量级的网页恶意代码检测方法,进行机器学习来自动得到分类模型。同时,为了弥补静态检测方法的不足,通过JavaScript虚拟机对可能代码混淆的部分进行解析,提高系统准确率。该方法主要对页面源码进行检测,不需要实际访问网页和检测系统行为,因此这个系统在保证检测准确的情况下资源消耗更少,速度更快,可以应用于如搜索引擎等大规模的网页恶意代码检测中。通过系统地分析网页恶意代码的特性,提取了恶意网页检测所用的特征,并完成了网页恶意代码检测原型系统的设计和实现,实验证明该系统能够较为准确有效的完成恶意网页检测。
[Abstract]:In recent years, malicious code, such as worms, Trojan horses, botnets, has always threatened the security of Internet. With the increasing popularity of WEB2.0 and cloud computing, more and more applications provide services based on WEB. There has been a trend of browser-level operating systems. Using vulnerabilities in browsers and browser plug-ins to replace vulnerabilities in operating systems and applications, malicious web pages have gradually become the main channel for spreading or attacking malicious code. Become the important link of underground economy. Malicious web pages are pages that contain malicious content so that viruses, Trojans and so on can spread or attack, including malicious content is also known as web Trojan, essentially not Trojan, It is the malicious code that propagates or attacks by using the web page as the medium, usually written in the script language such as JavaScript, VBScript, which is included in the web page, and carries out code confusion in various ways to avoid detection. The act of inserting malicious content into a web page is also known as "webpage hanging." Web malicious code downloads and runs malicious software, such as advertising software, Trojans and viruses, without the user's knowledge by exploiting vulnerabilities in the user's browser or plug-in. Normal web pages can also be planted with malicious code, so even if users visit some seemingly normal websites, they may also be attacked by such malicious code. Due to the extensive use of code obfuscation technology in web malicious code, the traditional anti-virus software has a high rate of missing reports, which leads to more and more attackers using web malicious code to spread malicious software. The existing methods of malicious web page detection can be divided into static detection (based on web content or web address) and dynamic detection (based on behavior caused by browsing web pages) and a mixture of the two methods. The traditional static detection method is simple and fast, but it can only detect the known features, so it is difficult to deal with the confusion of page code, so there will be a large number of false positives and false positives. Therefore, the existing systems often use dynamic detection methods. Open a browser in the virtual machine to open a web page and monitor the system's running state to find malicious behavior. The accuracy of dynamic monitoring method is high, but the resource consumption is large, so it can not be used to detect large scale web pages on the Internet. By analyzing the content of the page and extracting the features, a lightweight detection method of malicious code for web pages is proposed, which can be used for machine learning to get the classification model automatically. At the same time, in order to make up for the shortcomings of the static detection method, the JavaScript virtual machine is used to parse the confused parts of the possible code to improve the accuracy of the system. The method mainly detects the source code of the page, and does not need to actually visit the web page and detect the behavior of the system. Therefore, the system can consume less resources and speed up the detection under the condition of ensuring the accuracy of the detection. Can be applied to large-scale web pages such as search engine malicious code detection. Through the systematic analysis of the characteristics of the malicious code of the web page, the features used in the detection of the malicious web page are extracted, and the design and implementation of the prototype system for the detection of the malicious code of the web page are completed. Experiments show that the system can detect malicious web pages accurately and effectively.
【学位授予单位】：华中科技大学
【学位级别】：硕士
【学位授予年份】：2011
【分类号】：TP393.092

【同被引文献】