摘要
从高性能计算机体系结构上看,其正朝着超融合、多态复合、自适应方向发展,同时节点异构仍将是未来顶级高性能计算机的主流,而计算体系结构与应用适配的问题日渐突出,集群资源利用率低成为一个亟待解决的问题。介绍了一种面向E级计算超融合软件框架的设计与实现,系统融合高性能计算、深度学习、大数据以及云计算等应用处理技术,构建两级资源调度机制,解耦资源调度和资源分配,柔性扩展异构应用;并通过轻量级容器技术封装应用和隔离运行环境,实现应用动态扩展与部署管理。基于开放性协议实现整个集群资源的数据采集和分析处理,对集群提供智能数据分析依据,提高系统资源利用率,促进应用高效运行。
From the perspective of high-performance computer architecture,the architecture is developing towards hyper-convergence,multi-state recombination,and self-adaption.At the same time,node heterogeneity will remain the mainstream of the top high-performance computers in the future.The adaption of computing architecture and application is becoming more and more prominent.And the low utilization rate of cluster resources has become an urgent problem to be solved.This paper introduced the design and implementation of a hyper-converged software architecture for E-scale computing.The system integrated application processing technologies including high-performance computing,deep learning,big data and cloud computing,built a two-level resource scheduling mechanism,uncoupled resource scheduling and resource allocation,and flexibly extended heterogeneous applications.It encapsulated applications and isolated operating environments through lightweight container technology to achieve application dynamic expansion and deployment management.Based on open protocols,collection and analysis of the data of the entire cluster resource were implemented,which helps to provide intelligent data guidance to the cluster,improves system resource utilization,and promotes efficient application operation.
作者
戴荣
孙国忠
吕灼恒
秦晓宁
DAI Rong;SUN Guo-zhong;LV Zhuo-heng;QIN Xiao-ning(Dawning Information Industry(Beijing)Co.,Ltd.Beijing,100000,China)
出处
《计算机仿真》
北大核心
2020年第7期234-238,共5页
Computer Simulation
基金
国家重点研发计划(2016YFB0200300)。