当前位置:主页 > 文艺论文 > 广告艺术论文 >

基于VSTO的垃圾邮件过滤系统的设计与实现

发布时间:2018-04-04 11:32

  本文选题:垃圾邮件 切入点:朴素贝叶斯 出处:《西安电子科技大学》2012年硕士论文


【摘要】:垃圾邮件过滤是当前互联网应用中急需解决的一个重要课题,日益受到人们的关注。一般而言,垃圾邮件是指同一个发件人在同一时间将同一电子邮件寄往许许多多不同的用户,它的内容主要包括广告和一些政治宣传信件。如果经常收到这样的电子邮件,就会令人感到十分厌烦,而大量的信件更会扰乱电子邮件的正常使用。垃圾邮件过滤实际上是一类文本分类问题,朴素贝叶斯分类器是其中一种简单而有效的分类方法。该方法的不足在于它假定所有的属性是互相独立的,往往无法满足实际应用。但是,如果不作条件独立性假设,必然会导致组合爆炸。为此基于改进贝叶斯的垃圾邮件过滤算法已受到了越来越多研究人员的广泛关注。 本文首先研究了垃圾邮件过滤方法和和相应过滤算法,比较了一些典型算法的优缺点,研究了电子邮件的发送和接收协议,分析了当前垃圾邮件过滤技术研究现状;根据电子邮件系统的工作原理,着重分析了基于贝叶斯网络的垃圾邮件过滤技术;结合相应实例,分析了朴素贝叶斯邮件分类的分类特点及精度。提出了客户端邮件过滤软件的缺失问题,基于此设计了一个客户端邮件过滤系统。最后基于VSTO结合Outlook实现了一个邮件自动过滤系统。该系统集成了手动规则、黑名单、白名单、自动规则、单机器学习过滤器、集成学习过滤器等多种过滤手段,主要在计算机客户端对收到的新邮件进行处理,使用已经被分类为垃圾邮件和合法邮件的邮件作为实验来源,并获得了相应的特征模式。再对该特征进行学习,从而实现过滤的目的。 经测试,表明该系统功能齐全,过滤效果十分良好,其查准率≥95%,误拒率≤2%,误收率≤10%,,具有很高的推广价值。其次,该系统还可以作为Outlook的过滤插件使用,能对Outlook收件箱中的邮件进行自动邮件过滤。 反垃圾邮件的精确度和查全率一直是垃圾邮件过滤系统所要研究的重要方向,在以后的工作中还要继续在这方面加大研究力度,不断提高反垃圾水平。
[Abstract]:Spam filtering is an important problem that needs to be solved in the current Internet application, and has been paid more and more attention.Generally speaking, spam means that the same sender sends the same email to many different users at the same time. Its contents mainly include advertisements and some political propaganda letters.It can be tiresome to receive such emails on a regular basis, and a large number of them can disrupt their normal use.Spam filtering is actually a kind of text classification problem, and naive Bayes classifier is one of the simple and effective classification methods.The disadvantage of this method is that it assumes that all attributes are independent of each other and can not satisfy the practical application.However, if the hypothesis of conditional independence is not made, it will inevitably lead to a combination explosion.Therefore, the improved Bayesian spam filtering algorithm has attracted more and more researchers' attention.Firstly, this paper studies the spam filtering methods and corresponding filtering algorithms, compares the advantages and disadvantages of some typical algorithms, studies the sending and receiving protocols of email, and analyzes the current research status of spam filtering technology.According to the working principle of email system, the spam filtering technology based on Bayesian network is analyzed, and the classification characteristics and accuracy of naive Bayesian mail classification are analyzed.This paper puts forward the lack of client mail filtering software, and designs a client mail filtering system based on this.Finally, an automatic mail filtering system based on VSTO and Outlook is implemented.The system integrates manual rules, blacklists, white lists, automatic rules, single-machine learning filters, integrated learning filters, and so on.The spam and legitimate mail are used as the experimental sources and the corresponding characteristic patterns are obtained.Then the feature is studied to achieve the purpose of filtering.The test results show that the system has complete function and very good filtration effect. Its precision ratio 鈮

本文编号:1709776

资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/guanggaoshejilunwen/1709776.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户7154d***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com