基于贝叶斯智能分类的改进垃圾短信拦截平台构建

发布时间：2018-08-27 12:28

【摘要】：移动通信网络的高速发展，令手机短信成为人与人实时交流信息的重要方式之一，依附手机短信而大量传播的垃圾信息更成为了用户长期的困扰，虽然垃圾短信拦截技术的应用初有成效，但是只针对单一关键字识别垃圾短信以及对重复出现的大量相同短信内容的主叫号码进行过滤。然而不法分子通过广告类、欺诈类甚至违法类等垃圾短信赢得了巨大利益，经过统计发现，为了躲避这种实时策略，违规的垃圾短信生存和传播形态也发生了巨大改变。虽然可以对近期出现的垃圾短信进行统计分析，制定相应的关键字组合和拦截策略，但既要付出巨大人力成本，也只能暂时拦截符合统计规则的垃圾短信，这成为了垃圾短信不能根治的重要原因之一。因此，结合自学习智能分类系统和实时垃圾短信拦截系统，对于长期控制和识别过滤垃圾短信有着重要的意义。本文在充分的理论准备工作前提下，对中文分词、文本分类和关键的统计学习方法做了简要的阐述，确定使用贝叶斯分类方法作为智能分类学习的主要算法。对改进的垃圾短信拦截系统的前期系统需求分析和相应的系统设计做了详尽的描述，尤其在系统功能、系统处理流程和模块设计中，表现出了贝叶斯智能分类系统和实时拦截系统的应用结合的可能性和完善性，，并且系统最终生成的系统报表将作为后续垃圾短信分析处理的依据。贝叶斯智能分类系统的测试结果表明，通过对短信文本内容的预处理，使用贝叶斯分类算法可以有效的对短信进行较为准确的类别分类，和人工分类的测试结果进行比对，智能分类系统在处理大批量的短信文本准确率上要高于人工分类。综上所述，基于贝叶斯分类算法的垃圾短信拦截系统，改进了系统自学习能力不足的问题，较大程度的降低了对人工操作的依赖性，对于将来长期跟踪分析垃圾短信变化，以及针对垃圾短信的实时处理手段都有很大的帮助。改进的垃圾短信拦截系统通过实时拦截和非实时智能分析拦截的双重保障，能对短信进行有效的拦截过滤操作，保证了垃圾短信拦截系统上线后稳定运行。
[Abstract]:With the rapid development of mobile communication network, SMS has become one of the most important ways for people to exchange information in real time. Although the application of spam short message interception technology has been successful, it is only used to identify spam messages by single keyword and filter the calling numbers of a large number of repeated messages with the same message content. However, through advertising, fraud and even illegal spam SMS has won huge benefits. Through statistics, it is found that in order to avoid this real-time strategy, the existence and dissemination of illegal spam SMS has changed greatly. Although it is possible to make statistical analysis of recent spam messages and formulate corresponding keyword combinations and interception strategies, they have to pay a huge human cost and only temporarily intercept spam messages that conform to the statistical rules. This has become one of the important reasons why spam messages can not be cured. Therefore, the combination of self-learning intelligent classification system and real-time spam message interception system is of great significance for long-term control and identification of filtered spam messages. On the premise of sufficient theoretical preparation, this paper gives a brief description of Chinese word segmentation, text classification and key statistical learning methods, and determines the use of Bayesian classification method as the main algorithm of intelligent classification learning. The system requirement analysis and corresponding system design of the improved spam short message interception system are described in detail, especially in the system function, system processing flow and module design. It shows the possibility and perfection of the application of Bayesian intelligent classification system and real-time interception system, and the system report will be used as the basis for the subsequent analysis and processing of spam short message. The test results of Bayesian intelligent classification system show that by preprocessing the text content of short message, using Bayesian classification algorithm can effectively classify short message more accurately, and compare the test results of manual classification. Intelligent classification system is more accurate than manual classification in dealing with a large number of text messages. To sum up, the spam short message interception system based on Bayesian classification algorithm improves the problem of insufficient self-learning ability of the system, reduces the dependence on manual operation to a large extent, and tracks and analyzes the change of spam short message for a long time in the future. And the real-time processing means for spam messages are of great help. The improved spam short message interception system can effectively intercept and filter the short message through the dual guarantee of real time interception and non real time intelligent analysis interception, which ensures that the spam short message intercepting system can run stably after it goes on line.
【学位授予单位】：上海师范大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP18;TP391.1

【参考文献】