Hierarchical Cache Directory for CMP 被引量：4

Hierarchical Cache Directory for CMP

导出

摘要 As more processing cores are integrated into one chip and feature size continues to shrink, the average access la- tency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP （chip multiprocessor）, which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2^n × 2^n tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable. As more processing cores are integrated into one chip and feature size continues to shrink, the average access la- tency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP （chip multiprocessor）, which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2^n × 2^n tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable.

作者郭松柳王海霞薛一波李崇民汪东升

机构地区 Department of Computer Science and Technology Tsinghua National Laboratory of Information Science and Technology

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2010年第2期246-256,共11页 计算机科学技术学报（英文版）

基金 supported by the National Natural Science Foundation of China under Grant Nos.60673145,60773146 and 60833004.

关键词 cache coherence protocol hierarchical directory chip multiprocessor cache coherence protocol, hierarchical directory, chip multiprocessor

分类号 TP332 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献22

1Kim C, Burger D, Keckler S W. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ACM SIGPLAN Not., 2002, 37(10): 211-222. 被引量：1
2Chishti Z, Powell M D, Vijaykumar T N. Optimizing replication, communication, and capacity allocation in CMPs. In Proc. the 32nd Annual International Symposium on Computer Architecture, Madison, USA, June 4-8, 2005, pp.357- 368. 被引量：1
3Zhang M, Asanovic K. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), June 4-8, 2005, pp.336-345. 被引量：1
4Chang J, Sohi G S. Cooperative caching for chip multiprocessors. In Proc. the 33rd Annual International Symposium on Computer Architecture (ISCA 2006), Boston, USA, June 17-21, 2006, pp.264-276. 被引量：1
5Eislly N,Peh L S,Shang L. In-network cache coherence. In Proc. the 39th International Symposium on Microarchitecture (MICRO 2006), Orlando, USA, Dec. 9-13, 2006, pp.321- 332. 被引量：1
6Enright-Jerger N, Peh L S, Lipasti M. Virtual tree coherence: Leveraging regions and in-network Inulticast trees for scalable cache cohcrcnce. In Proc. 41st International Symposium on Microarchitecture (MICRO2008), Lake Como, Italy, Nov. 8- 12, 2008, pp.35-46. 被引量：1
7Wallach D A. PHD: A hierarchical cache coherent protocol [Master's Thesis]. MIT, September 1992. 被引量：1
8Gustavson D. The scalable coherent interface and related standards projects. [EEE Micro, Jan./Feb. 1992, 12(1): 10- 22. 被引量：1
9Nilsson It, Stenstrom P. The scalable tree protocol A cache coherence approach for large-scale multiprocessors. In Proc. SPDP 1992, Arlington, USA, Dec. 1-4, 1992, pp.498-506. 被引量：1
10Acacio M E, Gonzalez J, Garcia J Met al. A two-level directory architecture for highly scalable cc-NUMA multiproccssors. IEEE Transactions on Parallel and Distributed, Jan. 2005, 16(1): 67-79. 被引量：1

同被引文献73

1KHAN O, HOFFMANN H, LIS M, et al. ARCc : a case for an architecturally redundant cache-coherence architecture for large muhicores [ C]//Proc of the 29th IEEE International Conference on Computer Design. Washington DC : IEEE Computer Society ,2011:411-418. 被引量：1
2CHAIKEN D, FIELDS C, KURIHARA K, et al. Directory-based cache coherence in large-scale multiprocessors[ J]. Computer, 1990, 23(6) :49-58. 被引量：1
3Tilera Corporation. TILE64 processor product brief [ R/OL ]. (2008- 2009 ). http ://www. tilera, com/sites/default/files/productbriefs/ PB010_TILE64_Processor_A_v4. pdf. 被引量：1
4FENSCH C, CINTRA M. An OS-based alternative to full hardware coherence on tiled CMPs [ C ]//Proc of the 14th International Symposium on High Performance Computer Architecture. 2008:355-366. 被引量：1
5CELIO C P. Cache coherence strategies in a many-core processor [ D ]. Cambridge : Massachusetts Institute of Technology,2009. 被引量：1
6DUBEY P. Recognition, mining and synthesis moves computers to the era of tera[ R ]. [ S. l. ] :Intel Technology@ Corporation,2005. 被引量：1
7ZHOU Xiao-cheng, CHEN Hu, LUO Sai, et al. A case for software managed coherence in many-core processors [ C ]//Proc of the 2nd USENIX Workshop on Hot Topics in Parallelism. 2010. 被引量：1
8KELM J H, JOHNSON D R, TUOHY W, et al. Cohesion: a hybrid memory model for accelerators [ C ]//Proc of the 37th International Symposium on Computer Architecture. New York : ACM, 2010 : 429- 440. 被引量：1
9ROS A, ACACIO M E, GARCI J M:DiCo-CMP: efficient cache coherency in tiled CMP architectures [ C ]//Proc of IEEE International Symposium on Parallel and Distributed Processing. 2008 : 1-11. 被引量：1
10HARDAVELLAS N, FERDMAN M, FALSAFI B, et al. Reactive NUCA: near-optimal block placement and replication in distributed caches [ C ]//Proc of the 36th Annual International Symposium on Computer Architecture. New York : ACM, 2009 : 184-195. 被引量：1

引证文献4

1韩立敏,安建峰,高德远,樊晓桠,任向隆.众核处理器cache一致性研究综述[J].计算机应用研究,2012,29(11):4011-4016.
2张轮凯,宋风龙,王达,范东睿,孙凝晖.提升稀疏目录缓存一致性系统性能的方法[J].计算机研究与发展,2014,51(9):1955-1970. 被引量：3
3陈家豪,黄乐天,谢暄,魏敬和.基于片上网络互连的多核缓存一致性研究综述[J].电子与封装,2020,20(11):1-8. 被引量：2
4陈志强,周宏伟,冯权友,邓让钰.面向多核处理器的可配置缓存一致性协议设计与实现[J].计算机研究与发展,2021,58(6):1166-1175. 被引量：5

二级引证文献10

1崔自峰,刘竹旺,闫修林.分布式系统缓存一致性设计与应用[J].指挥信息系统与技术,2015,6(6):101-106. 被引量：4
2陈继承,赵雅倩,李一韩,王恩东,史宏志,唐士斌.MPD:结点具有多个并行缓存一致性域的CC-NUMA系统[J].计算机研究与发展,2017,54(4):775-786.
3吴健虢,陈海燕,刘胜,邓让钰,陈俊杰.多核Cache稀疏目录性能提升方法综述[J].计算机工程与科学,2019,41(3):385-392. 被引量：2
4贾一鸣,李磊,肖建青.一种面向多核系统的PLB转AXI桥接器设计[J].微电子学与计算机,2023,40(4):117-124.
5唐屹晨,孙维东,胡小刚,毛晓炜.基于RISC-V架构的Spike缓存模型的设计和实现[J].电子技术应用,2023,49(7):48-54. 被引量：1
6甘莹,邹文景,唐良运,孙刚.分布式资源库多路数据同步传输系统设计[J].电子设计工程,2023,31(18):28-31. 被引量：1
7段卓辉,刘海坤,赵金玮,刘一航,廖小飞,金海.一种可动态配置的分布式内存池缓存一致性机制[J].计算机研究与发展,2023,60(9):1960-1972. 被引量：1
8匡晓云,黄开天,杨祎巍.基于高密度计算的多核处理器电力芯片低功耗设计系统[J].电子设计工程,2024,32(7):6-9.
9周岩,王鹏,王琨予.基于MPI的鲲鹏CPU核间通信研究[J].西南民族大学学报（自然科学版）,2024,50(3):328-335.
10刘欢庆,周永录,刘宏杰,代红兵.基于交叉开关互连的多核堆栈处理器架构设计[J].计算机工程与设计,2024,45(7):2212-2219.

1陈浩.基于Drupal模块目录组织方式分析[J].计算机光盘软件与应用,2013,16(17):122-122.
2刘一兵.Windows 操作讲座(四) Windows文件管理器的概念与操作[J].办公自动化,1997(1):55-58.
3贺尔华,潘国腾,谢伦国.共享存储多处理机目录组织技术研究[J].计算机应用,2004,24(9):136-138.
4软件更新[J].电脑爱好者,2008,0(4):47-47.
5闫生超.基于BASH脚本的Unix环境下多组件部署管理框架[J].计算机系统应用,2012,21(10):61-65. 被引量：2
6温莹莹,荣晓飞,乔孟丽.基于智能优化算法的运输优化问题研究[J].科技资讯,2006,4(9):208-209.
7一个好人,王卫华.轻松提取浏览器缓存资源[J].电脑迷,2008,0(12):89-89.
8潘国腾,窦强,谢伦国.基于目录的Cache一致性协议的可扩展性研究[J].计算机工程与科学,2008,30(6):131-133. 被引量：4
9如何修改酷狗的缓存目录[J].电脑爱好者,2012(17):63-63.
10高攀,郭理.基于蚁群算法的煤炭运输优化方法[J].煤炭技术,2013,32(1):162-164. 被引量：3

Journal of Computer Science & Technology

2010年第2期

浏览历史

内容加载中请稍等...

Hierarchical Cache Directory for CMP 被引量：4

参考文献22

同被引文献73

引证文献4

二级引证文献10

相关作者

相关机构

相关主题

浏览历史