当前位置:主页 > 管理论文 > 移动网络论文 >

网络内容过滤系统设计与实现

发布时间:2018-11-03 17:23
【摘要】:校园网给师生提供便利的同时也带来了危害,大量不健康和无用的信息充斥着网络世界,给高校校园网的管理和维护带来了很大的挑战。网络内容过滤是一种有效的应对方法,能够自动地将网络中特定的信息过滤掉。本文首先回顾了国内外网络过滤领域的发展现状、存在的问题以及常见的过滤方法。本系统实现了两个关键的系统功能模块:网络数据包的捕获和重组模块、网络文本数据处理模块。文中完成了网络内容过滤系统两大关键功能:实现对特定URL的过滤以及对网页正文内容的过滤,其中网页正文是文本内容,不包括图像视频等多媒体信息。网络数据捕获模块主要研究分析了网络协议的解析,在具体的分析过程中涉及到以太网数据帧、IP数据包、TCP数据段和HTTP报文,同时在基于网络协议分析的基础上完成了在Windows系统下利用网络数据包捕获库Winpcap对网络数据包的捕获和分析,最终这个模块实现了URL过滤功能和HTML的页面重组,为文本数据处理模块提供了文本数据。根据校园网的特点,URL过滤功能中的URL过滤库可以由自行定义的多个不同规则库组成,并且根据不同时间段运行不同的过滤规则库。网络文本数据处理模块研究了网页文本分类技术。因为网页文本是一种半结构化的文本数据,首先研究和实现了从网页文本中提取文本数据。然后重点研究了文本分类技术,主要包括文本预处理和文本分类器的训练两大技术难点。文本预处理技术中还涉及到中文分词、特征选择和权重计算等技术。对现在主流的各种文本分类器进行了理论上的分析和比较,最终根据校园网的特点选择了类中心向量分类器作为文本分类器。根据训练集文本完成文本分类器的学习,并对分类器的效果进行了交叉验证测试,取得了较满意的分类结果。最后对网络内容过滤系统进行了总结和展望。希望下一步工作可以实现更加全面的网络内容过滤系统,不仅仅是文本内容,还可以包括图片、声音和视频等多媒体信息的过滤。
[Abstract]:Campus network not only provides convenience to teachers and students but also brings harm. A large number of unhealthy and useless information flooded the network world and brought great challenges to the management and maintenance of campus network in colleges and universities. Web content filtering is an effective response method, which can automatically filter out the specific information in the network. Firstly, this paper reviews the status quo, existing problems and common filtering methods in the field of network filtering at home and abroad. This system realizes two key function modules: network data packet capture and recombination module, network text data processing module. In this paper, two key functions of the network content filtering system are accomplished: filtering the specific URL and filtering the content of the text of the web page. The text of the web page is the text content, not the multimedia information such as image and video. The network data capture module mainly studies and analyzes the analysis of network protocol, which involves Ethernet data frame, IP data packet, TCP data segment and HTTP message. At the same time, on the basis of network protocol analysis, the capture and analysis of network data packets using network packet capture library (Winpcap) under Windows system is completed. Finally, this module realizes the function of URL filtering and the page recombination of HTML. Provides text data for text data processing module. According to the characteristics of campus network, the URL filter library in the URL filtering function can be composed of several different rule libraries defined by itself, and run different filtering rule libraries according to different time periods. Web text data processing module studies the technology of web page text classification. Because web text is a kind of semi-structured text data, firstly, we study and realize extracting text data from web text. Then it focuses on the text classification technology, including the text preprocessing and text classifier training two major technical difficulties. Chinese word segmentation, feature selection and weight calculation are also involved in text preprocessing. This paper analyzes and compares all kinds of mainstream text classifiers in theory, and finally selects class center vector classifier as text classifier according to the characteristics of campus network. According to the text of the training set, the text classifier is learned, and the effect of the classifier is tested by cross-validation, and satisfactory results are obtained. Finally, the network content filtering system is summarized and prospected. It is hoped that the next step will be to implement a more comprehensive network content filtering system, not only for text content, but also for the filtering of multimedia information, such as pictures, sounds and videos.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.08

【参考文献】

相关期刊论文 前1条

1 张莉,曾致远;Windows下网页信息实时监听程序的设计与实现[J];微计算机信息;2005年03期

相关硕士学位论文 前1条

1 曲建华;Web上的信息过滤问题研究[D];山东师范大学;2003年



本文编号:2308442

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2308442.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户59e81***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com