异构千核处理器系统的统一内存地址空间访问方法被引量：2

An approach to accessing unified memory address space of heterogeneous kilo-cores system

下载PDF

导出

摘要为了达到异构多核处理器能直接交叉访问对方的内存地址空间的目的,通过构建统一的三级Cache结构和数据块状态标记方法,并优化Cache块状态的修改算法,提出了异构千核处理器系统的统一内存地址空间访问方法,避免了当前独立式异构计算机系统结构下复制和传输数据块所带来的大量额外访存开销。通过采用部分Rodinia基准测试程序测试,获得了最高9.8倍的系统加速比,最多减少了90%的访存频率。因此,采用该方法能有效减少异构核心间交换数据块所带来的系统开销,提高异构千核处理器的系统性能加速比。 In order to access independent memory space of CPU and GPU directly from opposite directions,an effective approach to accessing unified memory address space of heterogeneous kilo-cores system is proposed,which is implemented by building a unified 3-level Cache and tagging blocks in Cache,and optimizing the algorithms of modifying the states of blocks. Therefore,the heterogeneous kilo-cores system avoids significant overhead of accessing memory instead of that in current discrete hybrid computer system equipped with GPUs by PCI-E. According to the results of experiments from partial programs of Rodinia benchmarks,a maximal speedup by 9. 8x and maximal decrease of load / store instructions by 90% are gained. In conclusion,it＇s certified that our solution is effective to decrease overhead of transferring data among computing units in heterogeneous system and significantly enhance the whole system computing performance.

作者裴颂文吴小东唐作其熊乃学

机构地区上海理工大学计算机科学与工程系中国科学院计算机体系结构国家重点实验室加利福尼亚大学电气工程与计算机科学系贵州大学计算机科学与技术学院科罗拉多科技大学计算机科学学院

出处《国防科技大学学报》 EI CAS CSCD 北大核心 2015年第1期28-33,共6页 Journal of National University of Defense Technology

基金计算机体系结构国家重点实验室开放资助项目(CARCH201206) 上海理工大学国家级项目培育基金资助项目(12XGQ07) 贵阳市科技计划项目(2011101414) 贵州省科技支撑项目(20123050)

关键词异构千核处理器内存地址空间交叉式直接访问 CACHE heterogeneous kilo-cores processors memory address space directly access from opposite directions Cache

分类号 TN95 [电子电信—信号与信息处理]

引文网络
相关文献

参考文献20

1Borkar S. Thousand core chips : a technology perspective[ C ]// Proceedings of the 44th Annual Design Automation Conference (DAC) , San Diego, California, 2007:746-749. 被引量：1
2Chung E S, Milder P A, Hoe J C, et al. Single-chip heterogeneous computing: does the future include custom logic, FPGAs, and GPGPUs [ C l//Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture ( MICRO), Adanta, GA, 2010 : 225 - 236. 被引量：1
3Brookwood N. AMD fusion family of APUs: enabling a superior, immersive PC experience [ EB/OL]. [ 2014 - 06 - 10]. http://www, amd. com. 被引量：1
4Intel haswell microarchitecture [ EB/OL ]. Intel Corpaoration. [2014 -06 - 10]. http://www, intel, com. 被引量：1
5Nvidia project denver[ EB/OL]. Nvidia Corporation. [ 2014 - 06 -101. http://www, nvidia, com. 被引量：1
6Big. LITTLE processing [ EB/OL ]. ARM Corporation [ 2014 - 06 - 10]. http://www, arm. com. 被引量：1
7Lustig D, Martonosi M. Reducing GPU offload latency via fine- grained CPU-GPU synchronization [ C ]//Proceedings of the IEEE 19th International Symposium on High-Performance Computer Architecture ( HPCA), Shenzhen, China, 2013 : 354 - 365. 被引量：1
8Daga M, Aji A M, Feng W. On the efficacy of a fused CPU + GPU processor ( or APU ) for parallel computing [ C ]// Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing, Knoxville Tennessee, 2011 : 141 - 149. 被引量：1
9Hwu W. Rethinking computer architecture for throughput computing[ C]//Keynote of the 2013 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), Greece, 2013:1 -29. 被引量：1
10Johnson D R, Kelm J H, Crago N C, et al. Rigel: a scalable architecture for 1000 + core accelerators [ J ]. IEEE Micro, 2011, 31(4) :30 -41. 被引量：1

<12 >

同被引文献3

1连瑞琦,张兆庆,乔如良.指令级并行编译器的数据预取及优化方法[J].计算机学报,2000,23(6):576-584. 被引量：8
2张建勋,古志民.帮助线程预取技术研究综述[J].计算机科学,2013,40(7):19-23. 被引量：3
3罗乐,刘轶,钱德沛.内存计算技术研究综述[J].软件学报,2016,27(8):2147-2167. 被引量：32

引证文献2

1裴颂文,张俊格,宁静.梯度学习的参数控制帮助线程预取模型[J].国防科技大学学报,2016,38(5):59-63. 被引量：1
2裴颂文,赵梦旖,姬燕飞.异构内存系统全局优化的数据预取算法[J].上海理工大学学报,2019,41(1):22-29. 被引量：1

二级引证文献2

1张军,胡廷贤,沈凡凡,谭海,何炎祥.基于Gem5+NVMain的混合存储体系结构模拟实验方法[J].实验技术与管理,2021,38(10):65-70. 被引量：2
2张建勋,古志民.基于交织预取率的帮助线程预取质量调节算法[J].计算机应用研究,2019,36(2):430-434. 被引量：2

1中国通信学会IP应用与增值电信技术委员会会议筹备组.增值电信的应用与信息服务研讨会征稿通知[J].现代电信科技,2005(6):78-78.
2Bosheng Sun.交叉式升压PFC数字电流平衡[J].今日电子,2014,0(4):46-48.
3洋铭参展2013四川电视节会[J].数码影像时代,2013(12):18-20.
4王亚民.智能多频测试仪研制[J].微型电脑应用,1989(4):216-219.
5姚龙.宏指令RTS的解释与更新[J].广东通信技术,2001,21(A01):34-37.
6王安华,王娟,徐节,郭殿林,常国祥.“多角度、交叉式”教学改革的研究与实践[J].高师理科学刊,2013,33(2):98-99. 被引量：2
7陈黎明,邹雪城,雷鑑铭,刘政林.用于低功耗的动态可重构cache结构[J].华中科技大学学报（自然科学版）,2008,36(9):29-32.
8李宁,李刚.数字交叉式转换开关AD8150[J].国外电子元器件,2001(6):24-28.
9吴传亮,黄德业,任代学.外高多层分叉重叠式刚挠结合板开发[J].印制电路信息,2015,23(3):155-159.
10姜义成,朱木.SAR雷达回波数据的实时采集与存储[J].科学技术与工程,2014,22(1):221-223. 被引量：2

<12 >

国防科技大学学报

2015年第1期

异构千核处理器系统的统一内存地址空间访问方法被引量：2

参考文献20

同被引文献3

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

异构千核处理器系统的统一内存地址空间访问方法 被引量：2

参考文献20

同被引文献3

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

微信扫一扫：分享

异构千核处理器系统的统一内存地址空间访问方法被引量：2