NTCI-Flow:一种可扩展的高速网络流量处理框架
发布时间:2018-09-10 14:16
【摘要】:针对当前基于软/硬件的流导出技术存在的数据失真、不易扩展等问题,本文提出一种准确、通用、易扩展的高速网络流量处理框架NTCI-Flow。首先,基于PF_RING DNA实现了高性能的网络包抓取,采用基于网络包五元组的负载均衡策略对网络包进行分组分发,并利用批处理、无锁队列、多线程等技术将多个网络包封装为单条大消息并行发送,改进与优化网络包转发性能;然后,采用Kafka消息系统作为中间件接收并缓存网络包,从而实现网络包的分布式导入;接着,基于Storm搭建实时流处理平台,开发并部署分布式流重组应用,实现从Kafka中读取网络包,解析并抽取五元组、包大小、时间戳等信息后重组成网络流;最后,增加Hive流数据导入模块,将导出的网络流数据以Parquet格式实时存入HDFS,利用Hive Metastore存储并管理元数据,同时采用基于时间的动态分区机制以减少按时间检索时不必要的磁盘IO。实验结果表明:网络流量采集模块可实现万兆流量的准确采集与转发,即使在万兆流量均为最小包(60字节)的情况下,仍可保证仅有0.03%的丢包率;网络流量导入模块吞吐率与磁盘写入性能相关,在使用7块硬盘缓存数据时吞吐率可达775 MB/s;分布式流重组模块具有良好的通用性及扩展性,通过简单配置即可达到1.26×10~7包/s的吞吐率。目前,NTCI-Flow已用于采集与处理某机构的出口流量,该机构平均流量约3.5 Gbps,峰值带宽为6 Gbps,每秒包数最高可达百万级。在该实际应用中,NTCI-Flow运行情况良好,由其得到的流量数据比Net Stream更准确。
[Abstract]:In view of the problems existing in the current flow export technology based on software / hardware, such as data distortion and inextensibility, this paper presents an accurate, universal and extensible high-speed network traffic processing framework NTCI-Flow.. First of all, the high performance network packet grab is realized based on PF_RING DNA, and the network packet is distributed by the load balancing strategy based on the five-tuple network packet, and the batch processing is used and the unlocked queue is used. Multithreading encapsulates multiple network packets into a single large message, improves and optimizes the network packet forwarding performance, and then uses Kafka message system as middleware to receive and cache network packets, so as to realize the distributed import of network packets. Then, a real-time stream processing platform based on Storm is built to develop and deploy distributed stream recombination application, which can read network packets from Kafka, parse and extract five-tuple, packet size, timestamp and reorganize into network flow. Adding Hive stream data import module, storing the exported network stream data into HDFS, in Parquet format and using Hive Metastore to store and manage metadata, and adopting the dynamic partitioning mechanism based on time to reduce the unnecessary disk IO. when retrieving by time The experimental results show that the network traffic acquisition module can accurately collect and transmit the ten thousand megabytes, even if the ten thousand megabytes are the smallest packet (60 bytes), only 0.03% of the packet loss rate can be guaranteed. The throughput of the network traffic import module is related to the disk write performance. The throughput of 775 MB/s; distributed stream recombination module has good generality and expansibility when using 7 hard disks to cache data. The throughput of 1.26 脳 10 ~ 7 packets / s can be achieved by simple configuration. At present, NTCI-Flow has been used to collect and process the outlet flow of a certain organization. The average flow of the mechanism is about 3.5 Gbps, with a peak bandwidth of 6 Gbps, / s and the maximum number of packets per second can reach 1 million. In this practical application, NTCI-Flow is running well and the flow data obtained from it are more accurate than Net Stream.
【作者单位】: 四川大学计算机学院;
【基金】:国家自然科学基金资助项目(61272447)
【分类号】:TP393.08
本文编号:2234683
[Abstract]:In view of the problems existing in the current flow export technology based on software / hardware, such as data distortion and inextensibility, this paper presents an accurate, universal and extensible high-speed network traffic processing framework NTCI-Flow.. First of all, the high performance network packet grab is realized based on PF_RING DNA, and the network packet is distributed by the load balancing strategy based on the five-tuple network packet, and the batch processing is used and the unlocked queue is used. Multithreading encapsulates multiple network packets into a single large message, improves and optimizes the network packet forwarding performance, and then uses Kafka message system as middleware to receive and cache network packets, so as to realize the distributed import of network packets. Then, a real-time stream processing platform based on Storm is built to develop and deploy distributed stream recombination application, which can read network packets from Kafka, parse and extract five-tuple, packet size, timestamp and reorganize into network flow. Adding Hive stream data import module, storing the exported network stream data into HDFS, in Parquet format and using Hive Metastore to store and manage metadata, and adopting the dynamic partitioning mechanism based on time to reduce the unnecessary disk IO. when retrieving by time The experimental results show that the network traffic acquisition module can accurately collect and transmit the ten thousand megabytes, even if the ten thousand megabytes are the smallest packet (60 bytes), only 0.03% of the packet loss rate can be guaranteed. The throughput of the network traffic import module is related to the disk write performance. The throughput of 775 MB/s; distributed stream recombination module has good generality and expansibility when using 7 hard disks to cache data. The throughput of 1.26 脳 10 ~ 7 packets / s can be achieved by simple configuration. At present, NTCI-Flow has been used to collect and process the outlet flow of a certain organization. The average flow of the mechanism is about 3.5 Gbps, with a peak bandwidth of 6 Gbps, / s and the maximum number of packets per second can reach 1 million. In this practical application, NTCI-Flow is running well and the flow data obtained from it are more accurate than Net Stream.
【作者单位】: 四川大学计算机学院;
【基金】:国家自然科学基金资助项目(61272447)
【分类号】:TP393.08
【相似文献】
相关期刊论文 前5条
1 唐磊,金连甫;大型网络自动信息机设计与实现[J];计算机工程与设计;2004年08期
2 谢文亮;唐屹;王大星;;基于FPGA技术的网络包头分类的研究[J];广州大学学报(自然科学版);2007年03期
3 BEAN;;我的IRC简历[J];软件世界;1998年10期
4 张文波,赵海,王小英,关沫;基于ARMLinux的EWS过载性能研究[J];通信学报;2005年08期
5 ;[J];;年期
相关硕士学位论文 前3条
1 朱新宇;基于邻居的分布式网络异常节点检测定位系统[D];上海交通大学;2015年
2 刘琦;网络包接收拥塞问题的研究与解决[D];东北大学;2005年
3 陈金牛;嵌入式IPv6防火墙设计与实现[D];厦门大学;2007年
,本文编号:2234683
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2234683.html