当前位置:主页 > 管理论文 > 移动网络论文 >

对抗环境下垃圾邮件过滤技术的研究

发布时间:2018-03-11 07:51

  本文选题:垃圾邮件过滤 切入点:对抗环境 出处:《华南理工大学》2015年硕士论文 论文类型:学位论文


【摘要】:伴随着网络的发展与日益普及,电子邮件已经成为人们日常信息交流的重要方式之一,方便了人们的日常生活和工作。但是大量的垃圾邮件也随之出现,在困扰人们正常通信的同时,也给社会带来巨大的经济损失,如何有效抑制垃圾邮件蔓延日益成为突出的问题。得益于人工智能技术的迅速发展,大量机器学习方法被应用于垃圾邮件过滤领域并取得不错的效果。然而在对抗环境下垃圾邮件制造者试图利用机器学习算法的弱点,通过各种方式来伪装垃圾邮件从而降低邮件分类器的效率。这种研究对抗环境下分类问题被称为对抗学习。规避攻击作为垃圾邮件制造者经常使用的一种攻击手段,其通过插入好词和删除坏词使得垃圾邮件在保持原有语义的同时减少其自身的垃圾特性,从而有效的规避过滤器的检测,降低邮件过滤系统的分类效率。本文系统地分析了垃圾邮件的产生及发展近状,总结了当前对抗环境下垃圾邮件过滤的主要研究现状。传统的TFIDF方法使用特征词频来表示特征的权重,而在应对好词攻击时坏词的权重下降很大从而降低了分类器的效率。因此本文提出了一种改进型SRTFIDF特征表示方法以降低好词攻击对特征权重的影响。实验结果表明在应对好词攻击时改进后的特征表示方法比传统的TFIDF方法鲁棒性更好。相较于单分类器系统,多分类器系统能够提高分类器的精确率和鲁棒性,但是研究表明在应对规避攻击时传统的多分类器系统表现不佳。因此本文提出了一种基于多示例学习的分段式多分类器垃圾邮件过滤方法来对抗规避攻击。我们将特征空间均分为两个示例,并且针对每个示例训练多个子分类器来提高分类器的鲁棒性。本文使用CEAS2008英文语料库对提出的方法进行有效验证。最终实验结果表明无论是应对好词攻击还是规避攻击分段式多分类器系统的精确率和鲁棒性比传统的多分类器系统表现更好。
[Abstract]:With the development and popularity of the Internet, email has become one of the most important ways for people to exchange information, which facilitates people's daily life and work. While disturbing people's normal communication, it also brings huge economic losses to the society. How to effectively curb the spread of spam has become an increasingly prominent problem, thanks to the rapid development of artificial intelligence technology. A large number of machine learning methods have been applied to spam filtering and have achieved good results. However, in a confrontational environment, spammers try to exploit the weaknesses of machine learning algorithms. In order to reduce the efficiency of email classifier by camouflage spam in a variety of ways, this research is called confrontation learning problem in antagonistic environment. Evading attacks is a common attack method used by spammers. By inserting good words and deleting bad words, spam can not only keep its original semantics but also reduce its own spam characteristics, thus effectively circumventing the detection of filters. To reduce the classification efficiency of mail filtering system. This paper systematically analyzes the generation and development of spam. The main research status of spam filtering in antagonistic environment is summarized. The traditional TFIDF method uses feature word frequency to express the weight of feature. However, the weight of bad words decreases greatly when dealing with attacks of good words, so the efficiency of classifier is reduced. In this paper, an improved SRTFIDF feature representation method is proposed to reduce the influence of good word attacks on feature weights. The improved feature representation method is more robust than the traditional TFIDF method in dealing with good word attacks. Multi-classifier system can improve the accuracy and robustness of the classifier. However, the research shows that the traditional multi-classifier system is not performing well in dealing with evasive attacks. Therefore, a segmented multi-classifier spam filtering method based on multi-example learning is proposed to counteract the evasive attacks. The feature space is divided into two examples, Moreover, several subclassifiers are trained for each example to improve the robustness of the classifier. This paper uses CEAS2008 English corpus to verify the proposed method effectively. Finally, the experimental results show that the proposed method can not only deal with good word attacks but also improve the robustness of classifiers. The accuracy and robustness of the multi-classifier system is better than that of the traditional multi-classifier system.
【学位授予单位】:华南理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.098

【参考文献】

相关期刊论文 前6条

1 邓蔚;秦志光;刘峤;程红蓉;;抗好词攻击的中文垃圾邮件过滤模型[J];电子测量与仪器学报;2010年12期

2 衣治安;刘杨;;基于二叉树的多分类SVM算法在电子邮件过滤中的应用[J];计算机应用;2007年11期

3 张玉芳;万斌候;熊忠阳;;文本分类中的特征降维方法研究[J];计算机应用研究;2012年07期

4 ;2014年第一季度中国反垃圾邮件状况调查报告[J];互联网天地;2014年07期

5 ;Large margin classification for combatingdisguise attacks on spam filters[J];Journal of Zhejiang University-Science C(Computers & Electronics);2012年03期

6 段宏斌;张健;;改进的Naive Bayes技术在反垃圾邮件系统中的应用[J];西北大学学报(自然科学版);2006年05期

相关博士学位论文 前3条

1 王博;文本分类中特征选择技术的研究[D];国防科学技术大学;2009年

2 陈彬;垃圾邮件的特征选择及检测方法研究[D];华南理工大学;2010年

3 李鹏;图像型垃圾邮件过滤关键技术研究[D];哈尔滨工业大学;2013年

相关硕士学位论文 前4条

1 安波;基于逻辑回归模型的垃圾邮件过滤系统的研究[D];哈尔滨工程大学;2009年

2 赵小华;KNN文本分类中特征词权重算法的研究[D];太原理工大学;2010年

3 赵利;基于中文主题变形的垃圾邮件过滤方法研究[D];武汉邮电科学研究院;2009年

4 罗常泳;基于内容的垃圾邮件检测方法研究[D];浙江大学;2014年



本文编号:1597244

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1597244.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户6fd4a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com