期刊文献+

异构千核处理器系统的统一内存地址空间访问方法 被引量:2

An approach to accessing unified memory address space of heterogeneous kilo-cores system
下载PDF
导出
摘要 为了达到异构多核处理器能直接交叉访问对方的内存地址空间的目的,通过构建统一的三级Cache结构和数据块状态标记方法,并优化Cache块状态的修改算法,提出了异构千核处理器系统的统一内存地址空间访问方法,避免了当前独立式异构计算机系统结构下复制和传输数据块所带来的大量额外访存开销。通过采用部分Rodinia基准测试程序测试,获得了最高9.8倍的系统加速比,最多减少了90%的访存频率。因此,采用该方法能有效减少异构核心间交换数据块所带来的系统开销,提高异构千核处理器的系统性能加速比。 In order to access independent memory space of CPU and GPU directly from opposite directions,an effective approach to accessing unified memory address space of heterogeneous kilo-cores system is proposed,which is implemented by building a unified 3-level Cache and tagging blocks in Cache,and optimizing the algorithms of modifying the states of blocks. Therefore,the heterogeneous kilo-cores system avoids significant overhead of accessing memory instead of that in current discrete hybrid computer system equipped with GPUs by PCI-E. According to the results of experiments from partial programs of Rodinia benchmarks,a maximal speedup by 9. 8x and maximal decrease of load / store instructions by 90% are gained. In conclusion,it's certified that our solution is effective to decrease overhead of transferring data among computing units in heterogeneous system and significantly enhance the whole system computing performance.
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2015年第1期28-33,共6页 Journal of National University of Defense Technology
基金 计算机体系结构国家重点实验室开放资助项目(CARCH201206) 上海理工大学国家级项目培育基金资助项目(12XGQ07) 贵阳市科技计划项目(2011101414) 贵州省科技支撑项目(20123050)
关键词 异构千核处理器 内存地址空间 交叉式直接访问 CACHE heterogeneous kilo-cores processors memory address space directly access from opposite directions Cache
  • 引文网络
  • 相关文献

参考文献20

  • 1Borkar S. Thousand core chips : a technology perspective[ C ]// Proceedings of the 44th Annual Design Automation Conference (DAC) , San Diego, California, 2007:746-749. 被引量:1
  • 2Chung E S, Milder P A, Hoe J C, et al. Single-chip heterogeneous computing: does the future include custom logic, FPGAs, and GPGPUs [ C l//Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture ( MICRO), Adanta, GA, 2010 : 225 - 236. 被引量:1
  • 3Brookwood N. AMD fusion family of APUs: enabling a superior, immersive PC experience [ EB/OL]. [ 2014 - 06 - 10]. http://www, amd. com. 被引量:1
  • 4Intel haswell microarchitecture [ EB/OL ]. Intel Corpaoration. [2014 -06 - 10]. http://www, intel, com. 被引量:1
  • 5Nvidia project denver[ EB/OL]. Nvidia Corporation. [ 2014 - 06 -101. http://www, nvidia, com. 被引量:1
  • 6Big. LITTLE processing [ EB/OL ]. ARM Corporation [ 2014 - 06 - 10]. http://www, arm. com. 被引量:1
  • 7Lustig D, Martonosi M. Reducing GPU offload latency via fine- grained CPU-GPU synchronization [ C ]//Proceedings of the IEEE 19th International Symposium on High-Performance Computer Architecture ( HPCA), Shenzhen, China, 2013 : 354 - 365. 被引量:1
  • 8Daga M, Aji A M, Feng W. On the efficacy of a fused CPU + GPU processor ( or APU ) for parallel computing [ C ]// Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing, Knoxville Tennessee, 2011 : 141 - 149. 被引量:1
  • 9Hwu W. Rethinking computer architecture for throughput computing[ C]//Keynote of the 2013 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), Greece, 2013:1 -29. 被引量:1
  • 10Johnson D R, Kelm J H, Crago N C, et al. Rigel: a scalable architecture for 1000 + core accelerators [ J ]. IEEE Micro, 2011, 31(4) :30 -41. 被引量:1

同被引文献3

引证文献2

二级引证文献2

相关主题

;
使用帮助 返回顶部