集群存储网络吞吐量塌陷行为研究

发布时间：2018-07-07 22:34

本文选题：集群存储 + 网络吞吐率　；参考：《华中科技大学》2012年博士论文

【摘要】：集群存储系统因其低成本和易于扩展等优势在云存储时代的数据中心中得到广泛应用。数据中心将集群存储系统构建在高速TCP以太网上，多个存储节点同时对外提供数据存取服务。由于集群存储网络负载的同步并发特性，在响应客户端的数据请求时，客户端带宽利用率随着集群节点数目的增加急剧下降，造成客户端的实际网络吞吐率只有正常情况下的20%左右，极大浪费了数据中心的网络带宽资源，也增加了客户端的数据存取延时。这种集群存储网络中的多对一通信模式下的TCP吞吐率塌陷行为被研究人员称为Incast。首先，通过模拟和真实集群环境下的实验测试再现了集群存储网络的Incast现象，从微观和宏观角度说明了Incast现象在集群存储网络中是普遍存在的，通过量化建模方式分析了其形成原因。根据实验trace的分析指出造成Incast的主要原因是TCP的超时，而现有的拥塞控制机制和TCP协议实现不能充分发挥其在集群存储网络环境中的优势，数据存储策略和应用负载的高并发性加剧了集群存储网络中的TCP超时，亦即加剧了Incast对实际吞吐率的负面影响。通过量化建模分析，回答了为什么现有的TCP拥塞控制机制不能在集群存储网络中发挥优势，阐述了实际网络吞吐率急剧下降的原因是瞬态的爆发性丢包造成的连续超时，并可根据量化评估模型估算发生超时的概率和实际吞吐率。这些研究都给下一步探索Incast的优化与解决方法提供了理论依据。其次，根据对Incast的形成原因的量化分析，对TCP最小重传超时计时器的实现进行了优化，一方面防止过大的RTOmin带来的TCP超时影响；另一方面也避免过小的RTOmin引发的TCP伪造重传。现有的TCP拥塞控制机制因为其普适性，最小超时重传计时器RTOmin在协议实现中精度设置过低，不能满足现在的高速集群存储网络环境的需要。在Linux-2.6.18以后版本的内核中，由于加入了内核对高精度时钟的支持，通过优化TCP协议实现中的RTOmin，平衡了TCP超时和伪造重传对集群存储网络吞吐率的影响。再次，在应用层采用负载控制措施，限制各个存储节点在同步读时的突发负载速率，避免集群存储网络传输中的瞬态爆发性丢包造成的TCP超时，从而解决了Incast问题。在Linux内核的网络接口负载控制模块支持下，通过控制脚本实现负载控制参数的传递，限制了多存储节点的同步并发传输的最大负载速率，预防网络拥塞状况的产生，，从而避免了瞬态的爆发性丢包事件造成的多次TCP超时。负载控制策略的核心思想是使得参与同步传输的多个存储节点均等占有网路瓶颈链路的带宽资源，亦即每个同步传输流的最大负载速率不能超过其在集群中应该分配的平均带宽。最后，对具有典型Incast负载特性的分布式连续数据保护系统，分析了其网络负载的Incast行为特性和本地磁盘的负载特性。针对网络Incast，采用了RTOmin优化和负载控制相结合的手段提高客户端实际吞吐率，降低数据请求的网络传输时间。对于校验服务器的本地磁盘负载采用缓冲链条的策略进行优化，降低校验服务器对于本地磁盘的IO次数，减少数据校验计算的等待时间。通过两方面的优化，提升集群存储系统的网络传输效率和本地IO性能，降低总的用户数据服务的响应时间。本文通过对Incast形成原因的分析和解决方法的研究，为大规模企业数据中心的高质量存储服务提供了保障。
[Abstract]:The cluster storage system is widely used in the data center of the cloud storage age because of its advantages of low cost and easy extension. The data center constructs the cluster storage system in the high-speed TCP Ethernet network, and multiple storage nodes provide the data access service at the same time. The response to the customer is in response to the synchronization and concurrency of the load of the cluster storage network. When the end of the data request, the client bandwidth utilization rate decreases sharply with the increase of the number of cluster nodes, resulting in the actual network throughput of the client only about 20% under normal conditions, which greatly wastes the network bandwidth resources of the data center and increases the data storage delay of the client. The collapse behavior of TCP throughput under the letter mode is called Incast. by researchers.
First, the Incast phenomenon of cluster storage network is reproduced by simulation and experimental test in real cluster environment. From the micro and macro point of view, the Incast phenomenon is common in the cluster storage network. The cause of its formation is analyzed by the quantitative modeling method. The main reasons for the cause of Incast are pointed out according to the analysis of real trace. It is the timeout of TCP, and the existing congestion control mechanism and the implementation of TCP protocol can not give full play to its advantages in the cluster storage network environment. The high concurrency of the data storage strategy and the application load aggravates the TCP timeout in the cluster storage network, that is, it aggravates the negative effect of Incast on the actual throughput. The answer is why the existing TCP congestion control mechanism can not play an advantage in the cluster storage network. The reason for the sharp decline in the actual network throughput is the continuous timeout caused by the transient explosive packet loss, and the probability of time out and the actual throughput can be estimated according to the quantitative evaluation model. All these studies give the next step to explore the In The optimization and solution of cast provide a theoretical basis.
Secondly, according to the quantitative analysis of the reasons for the formation of Incast, the implementation of the TCP minimum retransmission timeout timer is optimized. On the one hand, it prevents the oversized RTOmin from the TCP timeout effect; on the other hand, it avoids the TCP forged retransmission caused by the small RTOmin. The existing TCP congestion control mechanism is due to its universality, the minimum timeout retransmission In the protocol implementation, the precision setting of RTOmin is too low to meet the needs of the current high speed cluster storage network environment. In the kernel of the later version of Linux-2.6.18, the kernel is supported by the kernel for high precision clock. By optimizing the RTOmin in the implementation of the TCP protocol, the TCP timeout and forged retransmission of the cluster storage network are balanced. The impact of rate.
Thirdly, the load control measures are adopted in the application layer to limit the burst load rate of each memory node in synchronous reading, avoid the TCP timeout caused by the transient and explosive packet loss in the cluster storage network transmission, and thus solve the Incast problem. Under the support of the network interface load control module of the Linux kernel, the load control is implemented by the control script. The transmission of parameters limits the maximum load rate of synchronous concurrent transmission of multiple storage nodes, prevents network congestion and avoids the multiple TCP timeout caused by transient explosive packet loss events. The core idea of the load control strategy is that multiple storage nodes participating in synchronous transmission are equal to the network bottle neckline. The bandwidth resources of the road, that is, the maximum load rate of each synchronous transmission flow cannot exceed the average bandwidth that it should allocate in the cluster. Finally, a distributed continuous data protection system with typical Incast load characteristics is used to analyze the Incast behavior characteristics of the network load and the load characteristics of the local disk. Using RTOmin optimization and
The combination of load control improves the actual throughput of the client and reduces the network transmission time of the data request. The strategy of using the buffer chain for the local disk load of the check server is optimized, the IO frequency of the check server is reduced to the local disk, and the waiting time for the data checking calculation is reduced. The optimization of two aspects is made. To improve the network transmission efficiency and local IO performance of the cluster storage system, reduce the response time of the total user data service. This paper provides a guarantee for the high quality storage service of the large-scale enterprise data center through the analysis of the reasons for the formation of Incast and the study of the solutions.
【学位授予单位】：华中科技大学
【学位级别】：博士
【学位授予年份】：2012
【分类号】：TP333

【参考文献】