基于云服务模式的文本过滤关键技术研究与应用
发布时间:2018-06-19 15:27
本文选题:文本过滤 + 分类 ; 参考:《电子科技大学》2014年硕士论文
【摘要】:互联网的快速发展,使其成为人们交流信息的主要方式之一。但由于它的这种开放性,导致网络上存在很多如色情、暴力、迷信、反动等垃圾信息,严重影响了人们的日常上网活动。虽然目前已有很多文本过滤技术,但是随着外界环境的变化,文本过滤技术也需要不断地改进和提高。同时,随着人们生活水平的不断提高,越来越多的用户通过移动终端来访问互联网。如何保证移动用户能够通过移动设备获得健康的、有效的正常信息,这就需要在面向移动终端的云平台上实现文本过滤技术,从而实现对垃圾网页进行过滤处理。在这种需求下,本文在对现有的文本过滤关键技术进行了分析和讨论的基础上,改进了传统的基于向量空间模型的文本分类算法以及朴素贝叶斯分类算法,并采用这两种改进的文本分类算法构建了一个高性能的文本过滤系统;然后将该系统部署于面向移动终端的云平台,实现了云平台上的文本过滤服务。保证了移动终端用户能够通过移动设备访问互联网上正常的、合法的网页。本文的主要内容为:1、在对文本过滤技术中常用的特征选择算法进行分析研究的基础上,将等比例的思想运用于特征选择,使得提取的文本特征向量能够更准确地体现文本主题、类别信息等。2、在对文本过滤技术中已有的权重计算方法进行分析和讨论的基础上,考虑了特征项的结构信息、长度信息、比重信息等,对传统的权重计算方法进行了改进,使其能够更好地反映特征项对网页分类的重要程度。3、网页是一种结构化或半结构化的文档,因此本文采用模块化的方式对网页进行分类处理;同时将基于比重的改进权值计算方法以及等比例的特征选择方法应用于传统的基于向量空间模型的分类算法和朴素贝叶斯分类算法;从而利用这两个改进的分类算法构造了一个高性能的网页过滤系统,并且将该系统部署于云平台,提供了文本过滤服务。测试结果证明,改进的文本分类算法与传统的算法相比,具有更高的分类准确率、分类精度,较小的误判率和错误率等,进而改进的文本过滤系统具有更好的性能。
[Abstract]:With the rapid development of the Internet, it has become one of the main ways for people to exchange information. However, because of its openness, there are a lot of junk information such as pornography, violence, superstition, reactionary and so on the Internet, which seriously affects people's daily online activities. Although there are many text filtering technologies, text filtering technology needs to be improved and improved with the change of external environment. At the same time, with the continuous improvement of people's living standards, more and more users access the Internet through mobile terminals. How to ensure that mobile users can obtain healthy and effective normal information through mobile devices, which requires the implementation of text filtering technology on the cloud platform for mobile terminals, so as to achieve the filtering of garbage pages. Based on the analysis and discussion of the existing key technologies of text filtering, this paper improves the traditional text classification algorithm based on vector space model and naive Bayes classification algorithm. The two improved text classification algorithms are used to construct a high performance text filtering system, and then the system is deployed to the mobile terminal oriented cloud platform to realize the text filtering service on the cloud platform. It ensures that mobile end users can access normal and legitimate web pages on the Internet through mobile devices. The main content of this paper is: 1. On the basis of analyzing and studying the common feature selection algorithms in text filtering technology, we apply the idea of equal proportion to feature selection, so that the extracted text feature vector can reflect the text topic more accurately. Based on the analysis and discussion of the existing weight calculation methods in text filtering technology, the structure information, length information and specific gravity information of feature items are considered, and the traditional weight calculation method is improved. It can better reflect the importance of feature items to the classification of web pages. Web pages are a kind of structured or semi-structured documents. At the same time, the improved weight calculation method based on specific gravity and the equal proportion feature selection method are applied to the traditional classification algorithm based on vector space model and naive Bayes classification algorithm. Therefore, a high performance web page filtering system is constructed by using these two improved classification algorithms, and the system is deployed on the cloud platform to provide text filtering services. The test results show that the improved text classification algorithm has higher classification accuracy, lower error rate and error rate than the traditional algorithm, and the improved text filtering system has better performance.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.09;TP391.1
【参考文献】
相关期刊论文 前2条
1 阮彤,冯东雷,李京;基于贝叶斯网络的信息过滤模型研究[J];计算机研究与发展;2002年12期
2 张霖;罗永亮;陶飞;任磊;郭华;;制造云构建关键技术研究[J];计算机集成制造系统;2010年11期
,本文编号:2040343
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2040343.html