面向高性能计算环境的作业优化调度模型的设计与实现

发布时间：2018-12-19 20:23

【摘要】：高性能计算环境聚合了多个分布在不同地域、不同组织机构的高性能计算资源,面向用户提供统一的访问入口和使用方式,由系统中间件根据用户作业请求匹配合适的高性能计算资源。随着环境应用编程接口的开放以及作业请求数量的大幅增加,面对高并发作业提交请求时,目前采用的即时调度模型会由于网络等原因导致一定数量的请求处理失败,同时缺乏灵活性。针对此问题,优化了环境作业调度模型,引入作业环境队列,细化了作业系统层状态,增加了作业调度策略可配置性,并基于环境中间件SCE实现了系统原型。经测试,在单核心服务每分钟处理近200个作业提交请求的工作负载下,无因系统和网络原因引起的作业提交出错现象;在共计1 000个作业中,近500个作业提交命令请求在0.3s以内完成,800余个作业提交命令请求在0.5s以内完成。
[Abstract]:The high performance computing environment aggregates many high performance computing resources distributed in different regions and different organizations, and provides users with uniform access and usage methods. The system middleware matches the appropriate high performance computing resources according to the user's job request. With the opening of the environment application programming interface and the large increase of the number of job requests, when the high concurrent jobs submit requests, the current instant scheduling model will lead to a certain number of requests processing failure due to network and other reasons. At the same time, lack of flexibility. Aiming at this problem, the environment job scheduling model is optimized, the job environment queue is introduced, the state of the job system layer is refined, the job scheduling policy is configurable, and the prototype of the system is implemented based on the environment middleware SCE. After testing, under the workload of processing nearly 200 job submission requests per minute by single core service, there is no error phenomenon caused by system and network reasons. Of the 1 000 jobs, nearly 500 job requests are completed within 0.3 s, and more than 800 job submission requests are completed within 0.5 s.
【作者单位】：中国科学院计算机网络信息中心;
【基金】：国家重点研发计划项目(2016YFB0201404) 十二五863重大项目(2014AA01A302)
【分类号】：TP38

【相似文献】