基于改进度量尺度和阈值确定方法的马田系统及其在邮件过滤中的应用
发布时间:2018-11-04 09:53
【摘要】:随着互联网的发展和移动终端的普及,电子邮件逐渐成为一种重要的沟通方式。同时,大量的垃圾邮件给用户和服务商带来了诸多挑战,近年来电子邮件过滤逐渐成为了研究的热点问题。马田系统是面向多维变量的模式识别和分类预测的方法,该方法对数据分布类型无其他假设,可在约简特征变量后完成分类预测。本文针对传统马田系统在度量尺度和阈值计算方面的不足提出针对性的改进,将改进后的马田系统应用于电子邮件过滤研究。主要有以下三个方面的工作:(1)基于灰色关联度的马田系统新度量尺度研究。度量尺度的值反映了样品间的亲疏关系并据此判定样品的类别归属。马田系统将马氏距离用于衡量样品到基准空间距离贴近度,该统计量考虑了变量间相关性而忽视了样品与总体在空间范围内序列曲线的相似性。灰色关联模型是一种新的计算序列曲线形状相似性的方法,具有良好的通用性。为全面衡量样品间的近似度,本文通过线性加权方式将灰色关联度和马氏距离相结合,构建新的样本度量尺度,提高马田系统的准确率。(2)基于受试者工作特征曲线的马田系统阈值确定方法研究。马田系统的阈值计算方法一直备受关注,已有的众多方法均存在不同程度的局限性,难以有效地推广。受试者工作特征曲线是专门用于诊断效果分析和计算系统阈值的方法,主要应用于医学诊断领域,本文将受试者工作特征曲线用于马田系统研究,使马田系统阈值更加客观和精确。(3)基于改进马田系统的电子邮件过滤研究。将改进后的马田系统应用于电子邮件过滤研究,通过最终对比结果可以发现:相较于传统马田系统,改进后的马田系统在准确率、误报率和检出率等方面均有显著的提高,可见改进的方法是有效可行的;与其他常用的电子邮件过滤方法相比较,改进后马田系统准确率较高,同时特征变量的筛选可以大幅节约成本,提高邮件过滤的效率。
[Abstract]:With the development of the Internet and the popularity of mobile terminals, email has gradually become an important way of communication. At the same time, a large number of spam has brought many challenges to users and service providers. In recent years, email filtering has gradually become a hot issue. Matian system is a multi-dimensional variable oriented pattern recognition and classification prediction method. This method has no other assumptions about the data distribution type and can be used to achieve classification prediction after reducing the feature variables. Aiming at the shortcomings of the traditional martian system in the measurement and threshold calculation, this paper puts forward some improvements, and applies the improved Martian system to the research of email filtering. The main works are as follows: (1) the research of new metric of Matton system based on grey correlation degree. The value of measurement scale reflects the affinity between the samples and determines the classification of the samples. The Martian distance is used to measure the closeness of spatial distance from sample to datum. This statistic takes into account the correlation between variables and neglects the similarity between the sample and the sequence curve of the population in spatial range. Grey correlation model is a new method to calculate the shape similarity of sequence curves, and it has good generality. In order to measure the approximate degree of samples comprehensively, a new sample metric is constructed by combining grey correlation degree with Markov distance by linear weighting. (2) based on the operating characteristic curve of subjects, the threshold of the system is determined. The threshold calculation method of Matian system has been paid much attention to, and many of the existing methods are limited to some extent, so it is difficult to popularize them effectively. The operating characteristic curve of subjects is a special method for analyzing the diagnostic effect and calculating the threshold of the system. It is mainly used in the field of medical diagnosis. In this paper, the operating characteristic curve of the subject is applied to the study of the Martian system. It makes the threshold more objective and accurate. (3) the research of email filtering based on improved Martian system. The improved Martian system is applied to the research of email filtering. The results show that compared with the traditional Martian system, the improved system has a significant increase in accuracy, false positive rate and detection rate. It can be seen that the improved method is effective and feasible. Compared with other commonly used email filtering methods, the improved Martian system has a higher accuracy rate, and the filtering of feature variables can greatly reduce the cost and improve the efficiency of mail filtering.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.098;N941.5
本文编号:2309476
[Abstract]:With the development of the Internet and the popularity of mobile terminals, email has gradually become an important way of communication. At the same time, a large number of spam has brought many challenges to users and service providers. In recent years, email filtering has gradually become a hot issue. Matian system is a multi-dimensional variable oriented pattern recognition and classification prediction method. This method has no other assumptions about the data distribution type and can be used to achieve classification prediction after reducing the feature variables. Aiming at the shortcomings of the traditional martian system in the measurement and threshold calculation, this paper puts forward some improvements, and applies the improved Martian system to the research of email filtering. The main works are as follows: (1) the research of new metric of Matton system based on grey correlation degree. The value of measurement scale reflects the affinity between the samples and determines the classification of the samples. The Martian distance is used to measure the closeness of spatial distance from sample to datum. This statistic takes into account the correlation between variables and neglects the similarity between the sample and the sequence curve of the population in spatial range. Grey correlation model is a new method to calculate the shape similarity of sequence curves, and it has good generality. In order to measure the approximate degree of samples comprehensively, a new sample metric is constructed by combining grey correlation degree with Markov distance by linear weighting. (2) based on the operating characteristic curve of subjects, the threshold of the system is determined. The threshold calculation method of Matian system has been paid much attention to, and many of the existing methods are limited to some extent, so it is difficult to popularize them effectively. The operating characteristic curve of subjects is a special method for analyzing the diagnostic effect and calculating the threshold of the system. It is mainly used in the field of medical diagnosis. In this paper, the operating characteristic curve of the subject is applied to the study of the Martian system. It makes the threshold more objective and accurate. (3) the research of email filtering based on improved Martian system. The improved Martian system is applied to the research of email filtering. The results show that compared with the traditional Martian system, the improved system has a significant increase in accuracy, false positive rate and detection rate. It can be seen that the improved method is effective and feasible. Compared with other commonly used email filtering methods, the improved Martian system has a higher accuracy rate, and the filtering of feature variables can greatly reduce the cost and improve the efficiency of mail filtering.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP393.098;N941.5
【参考文献】
相关博士学位论文 前1条
1 陈湘来;关于马田系统若干问题的研究[D];南京理工大学;2008年
,本文编号:2309476
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2309476.html