期刊文献+

CPU和DRAM加速任务划分方法:大数据处理中Hash Joins的加速实例 被引量:3

Partitioning Acceleration Between CPU and DRAM:A Case Study on Accelerating Hash Joins in the Big Data Era
下载PDF
导出
摘要 硬件加速器能够有效地提高当前计算机系统的能效.然而,传统的硬件加速器(如GPU,FPGA和定制的加速器)和内存是相互分离的,加速器和内存之间的数据移动难以避免,这使得如何降低加速器和内存之间数据移动的开销成为极具挑战性的问题.随着靠近数据的处理技术(near-dataprocessing)和3D堆叠DRAM的出现,我们能够将硬件加速器集成到3D堆叠DRAM中,使得数据移动的开销大大降低.然而,由于3D堆叠DRAM对面积、功耗和散热具有严格的限制,所以不可能将一个功能复杂的硬件加速器完整地集成到DRAM中.因此,在设计内存端的硬件加速器时,应该考虑将加速任务在CPU和加速器之间合理地进行划分.以加速大数据系统中的一个关键操作hash joins为例子,阐述了CPU和内存端加速任务划分的设计思想.以减少数据移动为出发点,设计了一个包含内存端定制加速器和处理器端SIMD加速单元的混合加速系统,并对应用进行分析,将加速任务划分到不同的加速器.其中,内存端的加速器用于加速数据移动受限的执行阶段,而处理器端SIMD加速单元则用于加速数据移动开销较低成本的执行阶段.实验结果表明:与英特尔的Haswell处理器和Xeon Phi相比,设计的混合加速系统的能效分别提升了47.52倍和19.81倍.此外,提出的以数据移动为驱动的方法很容易扩展于指导其他应用的加速设计. Hardware acceleration has been very effective in improving energy efficiency of existing computer systems.As traditional hardware accelerator designs(e.g.GPU,FPGA and customized accelerators)remain decoupled from main memory systems,reducing the energy cost of data movement remains a challenging problem,especially in the big data era.The emergence of near-data processing enables acceleration within the 3D-stacked DRAM to greatly reduce the data movement cost.However,due to the stringent area,power and thermal constraints on the 3D-stacked DRAM,it is nearly impossible to integrate all computation units required for a sufficiently complex functionality into the DRAM.Therefore,there is a need to design the memory side accelerator with this partitioning between CPU and accelerator in mind.In this paper,we describe our experience with partitioning the acceleration of hash joins,a key functionality for databases and big data systems,using a data-movement driven approach on a hybrid system,containing both memory-side customized accelerators and processor-side SIMD units.The memory-side accelerators are designed for accelerating execution phases that are bounded by data movements,while the processor-side SIMD units are employed for accelerating execution phases with negligible data movement cost.Experimental results show that the hybrid accelerated system improves energy efficiency up to 47.52x and 19.81x,compared with the Intel Has well and Xeon Phi processor,respectively.Moreover,our data-movement driven design approach can be easily extended to guide the design decisions of accelerating other emerging applications.
作者 吴林阳 罗蓉 郭雪婷 郭崎 Wu Linyang;Luo Rong;Guo Xueting;Guo Qi(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190)
出处 《计算机研究与发展》 EI CSCD 北大核心 2018年第2期289-304,共16页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2017YFB1003101) 国家自然科学基金项目(61472396 61432016 61473275 61522211 61532016 61521092 61502446 61672491 61602441 61602446 61732002 61702478) 北京市科技计划项目(Z151100000915072) 中科院STS计划项目 国家"九七三"重点基础研究发展计划基金项目(2015CB358800)~~
关键词 3D堆叠内存 加速器 大数据 HASH joins RADIX joins算法的优化版本 hash分区加速器 3D-stacked DRAM accelerator big data hash joins optimized version of radix joins algorithm(PRO) hash partition accelerator(HPA)
  • 相关文献

参考文献1

二级参考文献20

  • 1Todd Jobson's Blog Reflections.Santa Clara,CA,USA:Sun Microsystems,2007. 被引量:1
  • 2Shatdal Ambuj,Kant Chander,Naughton Jeffrey F.Cache conscious algorithms for relational query processing//Proceedings of the 20th International Conference on Very Large Data Bases(VLDB'94).Santiago de Chile,Chile,Morgan Kaufmann,1994:510-521. 被引量:1
  • 3Mishra Priti,Eich Margaret H.Join processing in relational databases.ACM Computing Surveys,1992,24(1):63-113. 被引量:1
  • 4Boncz Peter A,Manegold Stefan,Kersten Martin L.Database architecture optimized for the new bottleneck:Memory access//Proceedings of the 25th International Conference on Very Large Data Bases(VLDB'99).Edinburgh,Scotland,UK,Morgan Kaufmann,1999:54-65. 被引量:1
  • 5Ailamaki Anastassia,DeWitt David J,Hill Mark D,Wood David A.DBMSs on a modern processor:Where does time go?//Proceedings of the 25th International Conference on Very Large Data Bases(VLDB'99).Edinburgh,Scotland,UK,Morgan Kaufmann,1999:266-277. 被引量:1
  • 6Manegold Stefan,Boncz Peter A,Kersten Martin L.What happens during a join? Dissecting CPU and memory optimization effect//Proceedings of the 26th International Conference on Very Large Data Bases(VLDB'00).Cairo,Egypt,Morgan Kaufmann,2000:339-350. 被引量:1
  • 7Stonebraker Mike,Abadi Daniel J,Batkin Adam et al.C-Store:A colmn-oriented DBMS//Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05).Trondheim,Norway,ACM,2005:553-564. 被引量:1
  • 8Han Xixian,Yang Donghua,Li Jianzhong.DBCC-Join:A novel cache-conscious disk-based join algorithm.Harbin Institute of Technology,Harbin:Technical Report DBTR-1002,2010. 被引量:1
  • 9He Bingsheng,Luo Qiong.Cache-oblivious nested-loop joins//Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management(CIKM'06).Arlington,Virginia,USA,ACM,2006:718-727. 被引量:1
  • 10He Bingsheng,Luo Qiong.Cache-oblivious query processing//Proceedings of th 3rd Biennial Conference on Innovative Data Systems Research(CIDR'07).Asilomar,CA,USA,2007:44-55. 被引量:1

共引文献3

同被引文献16

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部