基于Active SVM算法的恶意网页检测技术研究
发布时间:2018-11-11 18:03
【摘要】:网络时代,以脚本语言和浏览器插件技术为基础的新兴应用层见叠出,但是伴随着享受这些应用带来的方便和快捷的同时,我们也发现,信息泄露、信息窃取、数据篡改、数据删添、计算机病毒等等各种人为攻击也越来越肆虐。 针对Web威胁的网络攻击是网民受到的最主要的攻击。攻击者通过精心构造攻击代码,利用浏览器或者第三方插件的漏洞,达到攻击目的。 恶意代码编写者开发出大量恶意代码,并通过多种混淆手段对恶意脚本进行混淆和变形,逃避以特征码检测技术为主代表的恶意代码检测,其中尤其以JavaScript混淆代码为巨。各种混淆方式的应用产生了大量恶意代码的变种,借由因特网的时效性,迅捷性以广泛撒网式的传播方式威胁网民信息安全。这大大干扰了恶意代码的检测,成为整个web恶意代码中最为艰难的防御点。如何将此类攻击阻挡于我们计算机之外,保护网民的各类信息不受威胁,是当今社会亟待解决的问题,也是网络安全专家们前仆后继想要有所突破的问题。 论文主要研究了JavaScript混淆技术,提出了基于TF-IDF算法的特征提取,加入文本分类中的权重分析,使得对JavaScript脚本的特征抽取更科学,并且实验表明,基于TF-IDF的特征提取比传统的特征提取方法性能有很大提升。本文还将监督学习传统SVM的不足进行改进,提出了机器学习中主动学习策略,来简化人工操作,提高效率,实现系统的高度智能化,实验证明,基于Active SVM的恶意网页检测系统能在更少的样本标注,更少的人力投入情况下达到更好的性能。
[Abstract]:In the era of network, new applications based on scripting language and browser plug-in technology have emerged, but along with the convenience and speed brought by these applications, we also find that information disclosure, information theft, data tampering, Data deletion, computer viruses and other human attacks are more and more rampant. The network attack against Web threat is the most important attack to netizens. Attackers exploit vulnerabilities in browsers or third-party plug-ins by crafting attack code. Malicious code writers develop a large number of malicious code, and through a variety of obfuscation means to obfuscation and deformation of malicious scripts, to escape the signature detection technology represented by malicious code detection, especially the JavaScript obfuscation code as a giant. The application of various confusion methods has produced a large number of malicious code variants, by the timeliness of the Internet, the rapid spread of a wide spread of Internet users to threaten the security of information. This greatly interferes with the detection of malicious code and becomes the most difficult defense point in the whole web malicious code. How to block such attacks outside our computer and protect all kinds of information of Internet users from threats is a problem to be solved urgently in today's society, and it is also a problem that network security experts want to break through one after another. This paper mainly studies the JavaScript obfuscation technology, proposes the feature extraction based on the TF-IDF algorithm, adds the weight analysis in the text classification, makes the feature extraction of the JavaScript script more scientific, and the experiment shows that, The performance of feature extraction based on TF-IDF is much better than that of traditional feature extraction methods. In this paper, the shortcomings of traditional SVM are improved, and the active learning strategy in machine learning is put forward to simplify manual operation, improve efficiency and realize high intelligence of the system. The malicious web page detection system based on Active SVM can achieve better performance with less sample tagging and less manpower input.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
本文编号:2325691
[Abstract]:In the era of network, new applications based on scripting language and browser plug-in technology have emerged, but along with the convenience and speed brought by these applications, we also find that information disclosure, information theft, data tampering, Data deletion, computer viruses and other human attacks are more and more rampant. The network attack against Web threat is the most important attack to netizens. Attackers exploit vulnerabilities in browsers or third-party plug-ins by crafting attack code. Malicious code writers develop a large number of malicious code, and through a variety of obfuscation means to obfuscation and deformation of malicious scripts, to escape the signature detection technology represented by malicious code detection, especially the JavaScript obfuscation code as a giant. The application of various confusion methods has produced a large number of malicious code variants, by the timeliness of the Internet, the rapid spread of a wide spread of Internet users to threaten the security of information. This greatly interferes with the detection of malicious code and becomes the most difficult defense point in the whole web malicious code. How to block such attacks outside our computer and protect all kinds of information of Internet users from threats is a problem to be solved urgently in today's society, and it is also a problem that network security experts want to break through one after another. This paper mainly studies the JavaScript obfuscation technology, proposes the feature extraction based on the TF-IDF algorithm, adds the weight analysis in the text classification, makes the feature extraction of the JavaScript script more scientific, and the experiment shows that, The performance of feature extraction based on TF-IDF is much better than that of traditional feature extraction methods. In this paper, the shortcomings of traditional SVM are improved, and the active learning strategy in machine learning is put forward to simplify manual operation, improve efficiency and realize high intelligence of the system. The malicious web page detection system based on Active SVM can achieve better performance with less sample tagging and less manpower input.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08
【参考文献】
相关期刊论文 前7条
1 王晓丹,王积勤;支持向量机训练和实现算法综述[J];计算机工程与应用;2004年13期
2 段丹青;陈松乔;杨卫平;;网络入侵检测中的支持向量机主动学习算法[J];计算机工程与应用;2006年01期
3 奉国和;;SVM分类核函数及参数选择比较[J];计算机工程与应用;2011年03期
4 施聪莺;徐朝军;杨晓江;;TFIDF算法研究综述[J];计算机应用;2009年S1期
5 贺慧;王俊义;;主动支持向量机的研究及其在蒙文文本分类中的应用[J];内蒙古大学学报(自然科学版);2006年05期
6 凌俊斌;庄卫华;刘鲁西;;图像检索中的主动学习及其可测量性[J];计算机技术与发展;2006年02期
7 康松林;胡赐元;孙永新;;基于蜜罐在线恶意网页检测系统研究与设计[J];计算机系统应用;2010年02期
,本文编号:2325691
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2325691.html