一种基于语义的垃圾短信过滤算法
发布时间:2018-10-18 19:41
【摘要】:垃圾短信过滤是文本分类的一种,将用户收到的短信分为正常短信和垃圾短信,从而实现对垃圾短信的屏蔽。在朴素贝叶斯分类算法的基础上进行改进,针对短信内容较短包含信息不足的特点,引入同义词集对短信中特征词进行扩展,降低同义特征词分散给分类带来的负面影响。同时针对垃圾短信自身包含的特殊信息,提出模式概念,采用模式概念替换具有相同模式的特征词,使垃圾短信的特征更加集中,增强分类算法对垃圾短信的鉴别能力,最后通过实验对朴素贝叶斯算法以及改进后算法的分类性能进行了分析,验证了改进后算法的有效性。
[Abstract]:Spam short message filtering is a kind of text classification, which classifies the SMS received by users into normal SMS and spam SMS, so that the spam SMS can be shielded. Based on the naive Bayes classification algorithm, aiming at the lack of information in short message, the synonym set is introduced to extend the feature words in short message, so as to reduce the negative effect of synonym dispersion on the classification. At the same time, aiming at the special information contained in spam message itself, the concept of pattern is put forward, and the concept of pattern is used to replace the feature word with the same pattern, so that the feature of spam short message is more concentrated, and the ability of classification algorithm to identify spam message is enhanced. Finally, the classification performance of the naive Bayes algorithm and the improved algorithm are analyzed through experiments, and the effectiveness of the improved algorithm is verified.
【作者单位】: 南京师范大学泰州学院信息工程学院;
【基金】:江苏省大学生创新训练计划项目(201613843015Y) 教育部—Google2014年校企合作产学合作项目(PO640068)
【分类号】:TP391.1
本文编号:2280169
[Abstract]:Spam short message filtering is a kind of text classification, which classifies the SMS received by users into normal SMS and spam SMS, so that the spam SMS can be shielded. Based on the naive Bayes classification algorithm, aiming at the lack of information in short message, the synonym set is introduced to extend the feature words in short message, so as to reduce the negative effect of synonym dispersion on the classification. At the same time, aiming at the special information contained in spam message itself, the concept of pattern is put forward, and the concept of pattern is used to replace the feature word with the same pattern, so that the feature of spam short message is more concentrated, and the ability of classification algorithm to identify spam message is enhanced. Finally, the classification performance of the naive Bayes algorithm and the improved algorithm are analyzed through experiments, and the effectiveness of the improved algorithm is verified.
【作者单位】: 南京师范大学泰州学院信息工程学院;
【基金】:江苏省大学生创新训练计划项目(201613843015Y) 教育部—Google2014年校企合作产学合作项目(PO640068)
【分类号】:TP391.1
【相似文献】
相关期刊论文 前1条
1 何焱;宋丽丽;;关键领域热点发现与跟踪[J];西南师范大学学报(自然科学版);2014年07期
,本文编号:2280169
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2280169.html