期刊文献+

Hierarchical Cache Directory for CMP 被引量:4

Hierarchical Cache Directory for CMP
原文传递
导出
摘要 As more processing cores are integrated into one chip and feature size continues to shrink, the average access la- tency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP (chip multiprocessor), which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2^n × 2^n tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable. As more processing cores are integrated into one chip and feature size continues to shrink, the average access la- tency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP (chip multiprocessor), which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2^n × 2^n tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第2期246-256,共11页 计算机科学技术学报(英文版)
基金 supported by the National Natural Science Foundation of China under Grant Nos.60673145,60773146 and 60833004.
关键词 cache coherence protocol hierarchical directory chip multiprocessor cache coherence protocol, hierarchical directory, chip multiprocessor
  • 相关文献

参考文献22

  • 1Kim C, Burger D, Keckler S W. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ACM SIGPLAN Not., 2002, 37(10): 211-222. 被引量:1
  • 2Chishti Z, Powell M D, Vijaykumar T N. Optimizing replication, communication, and capacity allocation in CMPs. In Proc. the 32nd Annual International Symposium on Computer Architecture, Madison, USA, June 4-8, 2005, pp.357- 368. 被引量:1
  • 3Zhang M, Asanovic K. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), June 4-8, 2005, pp.336-345. 被引量:1
  • 4Chang J, Sohi G S. Cooperative caching for chip multiprocessors. In Proc. the 33rd Annual International Symposium on Computer Architecture (ISCA 2006), Boston, USA, June 17-21, 2006, pp.264-276. 被引量:1
  • 5Eislly N,Peh L S,Shang L. In-network cache coherence. In Proc. the 39th International Symposium on Microarchitecture (MICRO 2006), Orlando, USA, Dec. 9-13, 2006, pp.321- 332. 被引量:1
  • 6Enright-Jerger N, Peh L S, Lipasti M. Virtual tree coherence: Leveraging regions and in-network Inulticast trees for scalable cache cohcrcnce. In Proc. 41st International Symposium on Microarchitecture (MICRO2008), Lake Como, Italy, Nov. 8- 12, 2008, pp.35-46. 被引量:1
  • 7Wallach D A. PHD: A hierarchical cache coherent protocol [Master's Thesis]. MIT, September 1992. 被引量:1
  • 8Gustavson D. The scalable coherent interface and related standards projects. [EEE Micro, Jan./Feb. 1992, 12(1): 10- 22. 被引量:1
  • 9Nilsson It, Stenstrom P. The scalable tree protocol A cache coherence approach for large-scale multiprocessors. In Proc. SPDP 1992, Arlington, USA, Dec. 1-4, 1992, pp.498-506. 被引量:1
  • 10Acacio M E, Gonzalez J, Garcia J Met al. A two-level directory architecture for highly scalable cc-NUMA multiproccssors. IEEE Transactions on Parallel and Distributed, Jan. 2005, 16(1): 67-79. 被引量:1

同被引文献73

  • 1KHAN O, HOFFMANN H, LIS M, et al. ARCc : a case for an architecturally redundant cache-coherence architecture for large muhicores [ C]//Proc of the 29th IEEE International Conference on Computer Design. Washington DC : IEEE Computer Society ,2011:411-418. 被引量:1
  • 2CHAIKEN D, FIELDS C, KURIHARA K, et al. Directory-based cache coherence in large-scale multiprocessors[ J]. Computer, 1990, 23(6) :49-58. 被引量:1
  • 3Tilera Corporation. TILE64 processor product brief [ R/OL ]. (2008- 2009 ). http ://www. tilera, com/sites/default/files/productbriefs/ PB010_TILE64_Processor_A_v4. pdf. 被引量:1
  • 4FENSCH C, CINTRA M. An OS-based alternative to full hardware coherence on tiled CMPs [ C ]//Proc of the 14th International Symposium on High Performance Computer Architecture. 2008:355-366. 被引量:1
  • 5CELIO C P. Cache coherence strategies in a many-core processor [ D ]. Cambridge : Massachusetts Institute of Technology,2009. 被引量:1
  • 6DUBEY P. Recognition, mining and synthesis moves computers to the era of tera[ R ]. [ S. l. ] :Intel Technology@ Corporation,2005. 被引量:1
  • 7ZHOU Xiao-cheng, CHEN Hu, LUO Sai, et al. A case for software managed coherence in many-core processors [ C ]//Proc of the 2nd USENIX Workshop on Hot Topics in Parallelism. 2010. 被引量:1
  • 8KELM J H, JOHNSON D R, TUOHY W, et al. Cohesion: a hybrid memory model for accelerators [ C ]//Proc of the 37th International Symposium on Computer Architecture. New York : ACM, 2010 : 429- 440. 被引量:1
  • 9ROS A, ACACIO M E, GARCI J M:DiCo-CMP: efficient cache coherency in tiled CMP architectures [ C ]//Proc of IEEE International Symposium on Parallel and Distributed Processing. 2008 : 1-11. 被引量:1
  • 10HARDAVELLAS N, FERDMAN M, FALSAFI B, et al. Reactive NUCA: near-optimal block placement and replication in distributed caches [ C ]//Proc of the 36th Annual International Symposium on Computer Architecture. New York : ACM, 2009 : 184-195. 被引量:1

引证文献4

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部