期刊文献+

面向多核NUCA共享数据竞争问题的Bank一致性技术

Bank Coherence Techniques for Solving the Data Competition Problem in CMP NUCA
下载PDF
导出
摘要 非一致Cache体系结构(NUCA)几乎已经成为未来片上大容量cache的发展方向。多核处理器的NUCA结构中,多个处理器核对共享数据的竞争访问,可能导致数据经常处于中部的cache Bank,增加NUCA的访问延迟。本文提出支持数据副本的Bank一致性技术,通过有选择地在NUCA中为访问的处理器核创建不同的数据副本,Bank一致性技术能够缓解多核处理器对共享数据的竞争问题。本文详细地介绍了Bank一致性协议的设计方法。最后,使用全系统模拟器对8个NPB基准测试程序进行了详细评测。实验结果表明,Bank一致性技术能够有效缓解多核处理器中共享数据的竞争访问问题。相比不支持Bank一致性技术的CMP-DNUCA结构,本文的方法能将系统IPC性能平均提升5.95%。 Non-Uniform Cache Architecture (NUCA) has almost been the trend of large cache designs. In CMP-DNU- CA, two or more processors may access the same data, leading to the data competition problem. The data competition problem often makes the shared data stay in the central bank, so it brings a large cache access time. This paper proposes a bank coherence technique for supporting multi-copies of the shared data, which reduces the data competition effectively through making different data copies for processors. This paper studies the bank coherence protocol in detail. Finally, we test 8 NPB benchmarks using a full-system simulator. The experimental results show that the approach proposed in this paper effectively alleviates the data competition problem. Compared with CMP-DNUCA, the bank coherence mechanism achieves an average system IPC improvement of 5.95%.
出处 《计算机工程与科学》 CSCD 北大核心 2009年第11期21-24,49,共5页 Computer Engineering & Science
基金 国家自然科学基金资助项目(60621003 60873014 60633050)
关键词 非一致高速缓存 数据竞争 多核 存储体一致性 高速缓存一致性 NUCA data competition multi-core bank coherence cache coherence
  • 相关文献

参考文献10

  • 1Kim C, Burger D, Keckler S W. An Adaptive, Non-uniform Cache Structure for Wire-Delay Dominated on-Chip Caches [C]//Proc of the 10th Int'l Conf on Architectural Support for Programming Languages and Operating Systems, 2002: 211-222. 被引量:1
  • 2Beckmann B M, Wood D A. Managing Wire Delay in Large Chip-Multiprocessor Caches [C] //Proc of the 37th Annual IEEE/ACM Int'l Symp on Microarehiteeture, 2004 : 319-330. 被引量:1
  • 3Bailey D H, Barszcz E, Barton J T, et al. The Nas Parallel Benchmarks[J]. The Int'l Journal of Supercomputer Applications, 1991, 5(3) : 63-73. 被引量:1
  • 4Magnusson P S, Christensson M, Eskilson J, et al. Simics: A Full System Simulation Platform[J]. Computer, 2002, 35 (2) :50-58. 被引量:1
  • 5Muralimanohar N, Balasubramonian R, Jouppi N. Optimizing Nuca Organizations and Wiring Alternatives for Large Caches with Cacti 6.0[C]//Proc of the 40th Annual IEEE/ ACM Int'l Syrup on Microarchitecture, 2007: 3-14. 被引量:1
  • 6Chishti Z, Powell M D, Vijaykumar T N. Optimizing Replication, Communication, and Capacity Allocation in Cmps[C]//Proc of the 32nd Annual Int'l Syrup on Computer Architecture,2005 : 357-368. 被引量:1
  • 7Chang J, Sohi G S. Cooperative Caching for Chip Multiprocessors[C]//Proc of the 33rd Annual Int'l Symp on Computer Architecture, 2006 : 264-276. 被引量:1
  • 8Chang J, Sohi G S. Cooperative Cache Partitioning for Chip Multiprocessors[C] //Proc of the 21 st Annual Int'l Conf on Supercomputing, 2007 : 242-252. 被引量:1
  • 9Merino J, Puente V, Prieto P, et al. Sp-nuca: A Cost Effective Dynamic Non-uniform Cache Architecture [J]. SIGARCH Comput Archit News,2008, 36(2): 64-71. 被引量:1
  • 10Hennessy J L, Patterson D A. Computer Architecture: a Quantitative Approach[M]. 4th ed. Morgan Kaufmann Publishers Inc, 2007. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部