基于深度学习的钓鱼网站检测技术的研究
本文选题:网络钓鱼 + 特征提取 ; 参考:《电子科技大学》2017年硕士论文
【摘要】:大数据时代,网络安全仍是举足轻重的话题。在海量信息中,不乏非法分子利用网络骗取用户信任并从中获利,钓鱼网站就是其中之一。“钓鱼”网站的网址、网页内容、布局等与真实网站极其相似,没有安全意识的网民容易因此上当受骗,造成严重后果。有效遏制“钓鱼网站”是网络安全的保障。目前,国内外在防御钓鱼网站的研究上各有建树,然而都存在缺陷。现有的比较典型的检测钓鱼网站的方法有:基于黑白名单机制的检测、基于文本特征或网页图像特征的匹配检测、基于机器学习的分类检测。然而,基于黑白名单的检测方法时效性较差、名单范围也存在着不足,基于特征的算法的准确性和鲁棒性又不是很理想。近年来,机器学习应用于各领域并取得巨大成功,尤其是将深度学习应用于检测识别可以有效得提高检测效率。鉴于以上,本文研究已有的技术方法,提出基于深度学习的、具有鲁棒性的钓鱼网站检测方法。基于深度学习的钓鱼网站检测主要研究以下内容:钓鱼网站的特征提取是识别钓鱼网站的基础也是关键的一步,一个好的特征提取方法对检测结果起着至关重要的作用。通过对钓鱼网站特征的调研,以及对前人研究的总结,本文把网站页面和网页网址相结合,分别提取关于网页内容异常和链接异常的关键特征。为了提高检测速度和减少误判率采用了URL过滤器,并对爬取的URL进行相似度检测进一步提高检测的准确性,将网址特征和网页特征进行预处理并保存成特征向量以待下一模块的检测识别。近几年深度学习技术的提出以及其出色的特征学习能力使其在各领域的应用中取得巨大成功。因此,本文研究基于深度学习的钓鱼网站分类识别方法,并提出多层结构的DBN-KNN模型,将其运用到钓鱼网站特征的识别中,再对上述提取的特征向量进行学习、训练和分类,最后根据分类结果判别出钓鱼网站。综上,本学术论文针对现有检测方法的缺陷,研究基于深度学习的钓鱼网站检测方法。首先,爬取钓鱼网站数据并进行URL过滤和相似度检测;然后,人工分析并提取钓鱼网站的关键特征再对特征进行预处理;最后,提出深度学习模型DBN-KNN对特性向量进行训练分类,识别出钓鱼网站。
[Abstract]:In the era of big data, network security is still an important topic. Among the vast amount of information, there are many illegal elements who use the network to deceive users to trust and profit from it, among which phishing websites are one of them. "fishing" website URL, page content, layout and so on are very similar to the real site, no security awareness of the Internet users are easy to be deceived, resulting in serious consequences. Effective containment of "phishing website" is the guarantee of network security. At present, domestic and foreign research in the defense of fishing sites have their own achievements, but there are shortcomings. There are several typical methods to detect phishing websites: black-and-white list based detection, text feature or page image feature matching detection, machine learning based classification detection. However, the method based on black-and-white list is of poor timeliness, and the scope of the list is also inadequate. The accuracy and robustness of the feature-based algorithm are not ideal. In recent years, machine learning has been applied to various fields with great success, especially the application of depth learning in detection and recognition can effectively improve the detection efficiency. In view of the above, this paper studies the existing technical methods, and proposes a robust fishing site detection method based on depth learning. The research of phishing website detection based on deep learning is as follows: feature extraction of phishing website is the basis and key step to identify phishing website. A good feature extraction method plays an important role in the detection results. By investigating the features of phishing websites and summarizing the previous studies, this paper combines the web pages and web addresses to extract the key features of abnormal page content and link anomalies respectively. In order to improve the detection speed and reduce the error rate, the URL filter is adopted, and the similarity detection of crawling URL is carried out to further improve the accuracy of the detection. The URL features and web page features are preprocessed and stored as feature vectors to be detected and identified by the next module. In recent years, with the development of deep learning technology and its excellent feature learning ability, it has achieved great success in various fields. Therefore, this paper studies the classification and recognition method of phishing websites based on deep learning, and puts forward a multi-layer structure DBN-KNN model, which is applied to the recognition of phishing site features, and then studies, trains and classifies the extracted feature vectors. Finally, the fishing site is identified according to the classification results. In summary, aiming at the defects of existing detection methods, this paper studies the detection method of phishing website based on deep learning. First of all, crawl the fishing site data and carry out URL filtering and similarity detection; then manually analyze and extract the key features of the fishing site and preprocess the features; finally, the depth learning model DBN-KNN is proposed to train and classify the feature vector. Identify fishing sites.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP393.092
【参考文献】
相关期刊论文 前8条
1 钟冲;徐光柱;;结合前景检测和深度学习的运动行人检测方法[J];计算机与数字工程;2016年12期
2 李婉华;陈宏;郭昆;郭松荣;韩嘉民;陈羽中;;基于随机森林算法的用电负荷预测研究[J];计算机工程与应用;2016年23期
3 曾传璜;李思强;张小红;;基于AdaCostBoost算法的网络钓鱼检测[J];计算机系统应用;2015年09期
4 姜华;韩安琪;王美佳;王峥;吴栻玲;;基于改进编辑距离的字符串相似度求解算法[J];计算机工程;2014年01期
5 王婷;彭勇;戴忠华;伊胜伟;韩兰胜;;基于SVM-RFE的钓鱼网页检测方法研究[J];华中科技大学学报(自然科学版);2013年S2期
6 ;2012中国反钓鱼网站联盟年会在京召开[J];信息安全与通信保密;2012年12期
7 张玉芳;艾东梅;黄涛;熊忠阳;;结合编辑距离和Google距离的语义标注方法[J];计算机应用研究;2010年02期
8 曹玖新;毛波;罗军舟;刘波;;基于嵌套EMD的钓鱼网页检测算法[J];计算机学报;2009年05期
相关博士学位论文 前1条
1 闫琰;基于深度学习的文本表示与分类方法研究[D];北京科技大学;2016年
相关硕士学位论文 前4条
1 赵加林;基于K-Means和SVM的钓鱼网站识别的研究[D];西南交通大学;2016年
2 白雪;基于DBN的网络流量分类的研究[D];内蒙古大学;2015年
3 沈尚方;钓鱼网页的深度学习智能检测方法研究[D];华北电力大学;2012年
4 赵留军;一种基于TSVM的phishing网页检测方法[D];华北电力大学(北京);2011年
,本文编号:1991064
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1991064.html