摘要
对建立实用并行机群所需的机间通信、负载平衡和调度策略、并行编译、并行调试、故障恢复与容错、以及并行程序设计环境等关键技术及其研究开发工作的新进展做了比较详细的分析。给出了清华并行机群TH-COW系统中这些技术的研究结果以及与相关工作的比较。最后讨论了并行机群系统的应用与展望。
Cluster of workstations (COW) is becoming an important kind of parallel computing platform and has been paid more and more attention by many countries in the world. Some key techniques including communication among the nodes, load balancing and scheduling, parallel compiling, parallel debugging, fault recovery and parallel programming environment for developing COW and the new advances in these techniques are analyzed in detail in this paper. The research results of TH COW and their comparison with related works are also given. Finally, this paper discuss the application and perspective of the cluster computing technologies.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
1998年第S1期18-25,共8页
Journal of Tsinghua University(Science and Technology)
基金
国家"八六三"高技术项目
关键词
机群计算
负载平衡
通信机制
并行编译与调试
故障恢复
并行程序设计环境
cluster computing
load balancing
communication mechanism
parallel compiling and debugging
fault recovery
parallel programming environment