基于邮件意图与指纹分析的垃圾邮件过滤方法研究
发布时间:2018-05-09 14:10
本文选题:垃圾邮件 + 特征选择 ; 参考:《厦门大学》2014年硕士论文
【摘要】:随着互联网的飞速发展,电子邮件已经成为一种非常流行的沟通工具,被广泛的应用于个人通信和企业环境中。与之而来的垃圾邮件却给网络用户带来了非常大的安全隐患,这些安全隐患包括浪费使用者时间、存储空间以及宝贵的网络带宽等。如今,有越来越多的人专注于制造垃圾邮件,只要企业和互联网有邮件接口就不能豁免其影响,就连主流的社交网站,像是腾讯、新浪微博和谷歌这样的知名企业都不能例外。由于垃圾邮件爆炸式的增长,也产生了众多与之对应的反垃圾邮件技术。这种‘掰手腕’式的技术对抗也使得反垃圾邮件的手段和技术逐年成熟。尽管在新的反垃圾邮件技术部署之后,垃圾邮件可以得到暂时的控制,但是制造者们也在不断反过滤,产生更新的垃圾邮件技术。针对以上情况,本文开展了以下方面的研究工作: ●分析和研究当下反垃圾邮件系统的设计原理、实现方法以及现状,对典型的反垃圾邮件技术的特点进行归纳和总结,了解和把握垃圾邮件识别技术新的发展趋势; ●通过对大量垃圾邮件和正常邮件进行分析,发现二者的邮件发送者在意图表现上有所不同,从而选取用于分类的意图特征,研究如何高效而准确地提取这些意图特征,并在测试数据集中验证其效率和准确性; ●在大量邮件的基础上,找出图像形式的垃圾邮件,发现其特点,在垃圾邮件中分析图像邮件与文本邮件的区别,分析基于机器学习的反图像垃圾邮件技术存在的不足,提出了基于统计概率、规则和投票机制的图像过滤选择方法。 ●构造一个高效的哈希生成算法,对垃圾邮件正文信息和附件进行采样计算哈希值,生成指纹文件,再与在线指纹库进行对比从而识别一封邮件是否为垃圾邮件。
[Abstract]:With the rapid development of the Internet, email has become a very popular communication tool, widely used in personal communications and enterprise environment. However, spam brings great security risks to network users, including wasting user time, storage space and valuable network bandwidth. Today, more and more people are focused on spamming, as long as businesses and the Internet have email interfaces, even mainstream social networking sites, such as Tencent, Sina Weibo and Google, are no exception. Because of the explosive growth of spam, there are many anti-spam technologies. This' wrist-breaking 'type of technical confrontation also makes anti-spam means and technology mature year by year. Although spam can be temporarily controlled after the deployment of the new anti-spam technology, manufacturers are also constantly de-filtering to produce newer spam technologies. In view of the above situation, this paper has carried out the following research work: This paper analyzes and studies the design principle, implementation method and present situation of the current anti-spam system, summarizes and summarizes the characteristics of typical anti-spam technology, and understands and grasps the new development trend of spam identification technology. Through the analysis of a large number of spam and normal mail, it is found that the two mail senders have different intention performance, and then select the intention features used for classification, and study how to extract these intention features efficiently and accurately. And verify its efficiency and accuracy in the test data set; On the basis of a large number of mail, we find out the image form of spam, find its characteristics, analyze the difference between image mail and text mail in spam, and analyze the shortcomings of anti-image spam technology based on machine learning. An image filtering selection method based on statistical probability, rule and voting mechanism is proposed. An efficient hash generation algorithm is constructed to sample and calculate the hash value of spam text information and attachments, generate fingerprint files, and then compare with the online fingerprint database to identify whether an email is spam or not.
【学位授予单位】:厦门大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.098
【参考文献】
相关期刊论文 前7条
1 张秋余;孙晶涛;闫晓文;黄文汉;;LSA和MD5算法在垃圾邮件过滤系统的应用研究[J];电子科技大学学报;2007年06期
2 李楠萼;卢显良;;分层垃圾邮件过滤器的设计与实现[J];计算机应用;2005年S1期
3 万明成;耿技;程红蓉;陈佳;;图像型垃圾邮件过滤技术综述[J];计算机应用研究;2008年09期
4 张良胜;蒋建中;陈金阳;郭军利;李娜;;基于速率控制的反垃圾邮件模型设计[J];计算机应用与软件;2006年11期
5 王昕溥;姚健康;李晓东;王峰;毛伟;;域密钥识别邮件技术综述[J];计算机应用研究;2008年01期
6 尹勇;;垃圾邮件的危害与防范[J];科协论坛(下半月);2013年01期
7 杨磊;张代远;;基于DKIM和评分管理的反邮件系统的设计[J];计算机技术与发展;2013年07期
,本文编号:1866326
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1866326.html