摘要
高性能计算环境聚合了多个分布在不同地域、不同组织机构的高性能计算资源,面向用户提供统一的访问入口和使用方式,由系统中间件根据用户作业请求匹配合适的高性能计算资源。随着环境应用编程接口的开放以及作业请求数量的大幅增加,面对高并发作业提交请求时,目前采用的即时调度模型会由于网络等原因导致一定数量的请求处理失败,同时缺乏灵活性。针对此问题,优化了环境作业调度模型,引入作业环境队列,细化了作业系统层状态,增加了作业调度策略可配置性,并基于环境中间件SCE实现了系统原型。经测试,在单核心服务每分钟处理近200个作业提交请求的工作负载下,无因系统和网络原因引起的作业提交出错现象;在共计1 000个作业中,近500个作业提交命令请求在0.3s以内完成,800余个作业提交命令请求在0.5s以内完成。
The high performance computing environment is a computing platform, which aggregates multiple distributed high performance computers from indifferent organizations, providing users with unified access and usage patterns. The system middleware matches the appropriate highperformance computing resources according to users’job request. With the opening of the environment programming interface (API) and the substantial increase in the number of job submission requests, some job submission requests fail because of too many network connections under high concurrent job submission requests. Also, the job scheduling strategy is lack of flexibility. We propose an optimized job scheduling model in the high performance computing environment, which introduces environment job queues, refines the systemlevel status for each job, and increases the configuration of job scheduling strategy. We also implement a prototype system based on middleware SCE. Test results show that no job request fails under the workload of 200 job requests each minute in a single system service. In a total of 1000 jobs, nearly 500 job submissions are completed within 0.3 seconds, and more than 800 job submissions are completed in less than 0.5 seconds.
出处
《计算机工程与科学》
CSCD
北大核心
2017年第4期619-626,共8页
Computer Engineering & Science
基金
国家重点研发计划项目(2016YFB0201404)
十二五863重大项目(2014AA01A302)
关键词
中国国家网格
高性能计算环境
网格计算
云服务
作业调度
CNGrid
high performance computing environment
grid computing
cloud service
job scheduling