摘要
现代半导体工艺技术的发展使得在单芯片上放置数百个运算单元成为可能,但是全局片上片外带宽受限。通用处理器体系结构不能较好地适应变化,仍然依靠全局片上结构,少量的运算单元。而流体系结构拥有大量的运算单元、鲜明的存储层次,使得在有限的片外带宽下,用高的本地带宽来满足大量运算单元的需求。首先介绍了原型MASA流体系结构,然后给出了爆轰流体力学中的二维拉格朗日和欧拉结合法(Ygx2)在流体系结构上实现的实例研究,最后用时钟精确的模拟器来评测应用的运行性能,结果表明Ygx2应用在500MHz的MASA上运行结果与1.6GHz的Iantium2的比较快近4倍,证实了流体系结构在高性能计算领域的极大潜力。
Modem semiconductor technology allows us to place hundreds of functional units on a single chip which provides limited global on-chip and off-chip bandwidths. General. purpose processor architectures have not adapted to this change in the capabilities and constraints of the underlying technology, still relying on global on-chip structures for operating a small number of functional units. Stream processors, on the other hand, have a large number of functional units, and utilize multiple register hierarchies with high local bandwidth to match the bandwidth demands of the functional units with the limited available off-chip bandwidth. This paper describes the microarchitecture of MASA and presents the implementation of the application in fluid dynamics. We developed cycle-accurate simulator to evaluate the performance. The results show that the application on 500MHz MASA outperforms a 1.6G Itanium2 by a factor of 4. This research confirms that stream architecture has the potential to deliver high performance.
出处
《国防科技大学学报》
EI
CAS
CSCD
北大核心
2006年第4期43-48,共6页
Journal of National University of Defense Technology
基金
国家自然科学基金资助项目(60573103)
高性能计算创新团队基金资助项目(IRT0446)
关键词
流体系结构
Ygx2
高性能计算
stream architecture
Ygx2
high performance computing