多处理器系统的线程调度策略研究

发布时间：2018-03-10 05:06

本文选题：SMP　切入点：Linux进程调度　出处：《电子科技大学》2012年硕士论文　论文类型：学位论文

【摘要】：在复合了片上多核（CMP）、同时多线程（SMT）技术的SMP系统中，不合理的资源共享机制可能会造成并发多线程对公共资源的不可控争用，从而导致系统吞吐量与资源利用率的降低。因此，设计与实现侧重于优化系统资源使用的SMP调度机制一直是操作系统研究中的重要方向。本论文首先介绍了多处理器多核架构技术的演进与相应特点。接着，对相应架构下线程调度策略的研究进行了回顾与总结。随后，论文概述了Linux调度器的发展历程与相应版本调度器的特点，通过深入地分析Linux3.0时代仍然沿用的CFS调度器框架和基于调度域的负载均衡实现找到了Linux调度器的不足之处——执行负载均衡时只考虑的是均衡各CPU的工作负载，而没有考虑总线带宽使用的均衡操作恰恰会引起总线有效利用率的下降。举例来说，可能会出现过高带宽需求的进程被迁移后得到了充分执行从而耗尽了总线的可用带宽；也可能出现有合理带宽需求的进程始终无法迁出重载的CPU从而缺少执行机会，，这些情况都会使总线带宽资源无法得到有效的利用。最后，论文提出了考虑总线带宽使用优化的SMP调度策略与基于当前Linux调度器的改进方法。改进的思路是通过获取线程运行时的性能计数来评估其在最近的采样时间窗口内总线带宽使用状况，以此为任务调度提供直接的决策依据。利用处理器内建的硬件性能计数器就可获取线程运行时有效执行指令数目、各层级Cache未命中数，从而计算出采样时间窗口内使用的平均总线带宽。由于是基于总线使用的历史情况做调控，采样周期与时间窗口值的确定就很关键。采样时间窗口宽度过小，就无法评估近期线程平均带宽使用情况；宽度过大则调度分配带宽的时机就不好，也许会错过可以提前避免争用高峰的调控时机。在具体实现时通过重复测试确定了时间窗口的合适取值。在对原SMP负载均衡算法实现做优化时，不仅考虑到CPU/Cache亲和性同时也基于进程总线带宽的使用状况来挑选迁移进程。针对采用了STREAM Benchmark的三组测试结果的分析表明改进方案在不影响原有算法CPU负载均衡效果的基础上优化了总线带宽的使用、提升了总线的有效利用率。
[Abstract]:In the SMP system which combines the technology of multi-core SMP and multithreading, the unreasonable resource sharing mechanism may lead to the uncontrollable use of common resources by concurrent multithreading, which leads to the decrease of system throughput and resource utilization. The design and implementation of SMP scheduling mechanism, which focuses on optimizing the use of system resources, has been an important direction in the research of operating system. This paper first introduces the evolution and characteristics of multi-processor multi-core architecture, then reviews and summarizes the research of thread scheduling policy under the corresponding architecture. This paper summarizes the development of Linux scheduler and the characteristics of the corresponding version scheduler. By deeply analyzing the CFS scheduler framework used in Linux3.0 era and the implementation of load balancing based on scheduling domain, the shortcomings of Linux scheduler are found. Only the workload of each CPU is balanced when carrying out load balancing. The equalization operation without considering the use of bus bandwidth will lead to the decrease of the effective utilization of the bus. For example, the process with excessive bandwidth requirements may be migrated and fully executed, thus exhausting the available bandwidth of the bus; It is also possible that processes with reasonable bandwidth requirements will always be unable to move out of the overloaded CPU and thus lack execution opportunities, which will prevent the bus bandwidth resources from being effectively utilized. Finally, This paper proposes a SMP scheduling strategy considering the optimization of bus bandwidth usage and an improved method based on the current Linux scheduler. The improved idea is to evaluate the recent sampling time window by obtaining the performance count of the thread runtime. Internal bus bandwidth usage, By using the hardware performance counter built in the processor, the number of effective execution instructions can be obtained when the thread is running, and the number of Cache misses at each level can be obtained by using the hardware performance counter built in the processor to provide a direct decision basis for task scheduling. Therefore, the average bus bandwidth used in the sampling time window is calculated. Because it is based on the history of bus usage, it is very important to determine the sampling period and the value of the time window. The width of the sampling time window is too small. It is impossible to assess the recent average bandwidth usage of threads; if the width is too large, the timing of scheduling and allocating bandwidth is not good. You may miss the opportunity to avoid peak contention ahead of time. The appropriate value of the time window is determined by repeated tests at the time of implementation. When you optimize the implementation of the original SMP load balancing algorithm, Considering not only the CPU/Cache affinity but also the use of process bus bandwidth, the migration process is selected. The analysis of three groups of test results using STREAM Benchmark shows that the improved scheme does not affect the original algorithm CPU load balancing effect. The use of bus bandwidth is optimized based on the. Improved the effective utilization of the bus.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP332

【引证文献】