一种基于图形处理器压缩结构的预取结构设计

A prefetch architecture design based on graphics processor compression architecture

下载PDF

导出

摘要图形处理器(GPU)访存利用率已经成为影响其性能的关键瓶颈之一。在处理器设计中,访存的预取结构设计成为了提高访存利用率的主要方法之一。结合图形处理器的访存密集的特点,在提高预取性能的前提下,减小影响图形流水线正常效率成为热门的研究方向。本文基于一种图形处理器无损压缩的结构,提出了一套图形处理器的预取结构设计。本预取结构设计可在访存密集型的图形流水线中有效提高访存利用率,并不影响当前图形流水线的效率。实验结果表明,在Godson GPU图形处理器平台上,与传统预取结构相比,针对访存密集型测试程序,cache命中率可以提高15%以上。针对访存空闲的测试程序,该设计不会对流水线产生负面影响。 Graphics processing unit(GPU)memory access utilization has become one of the key bottlenecks affecting performance.In processor design,memory access prefetch architecture design has become one of the main methods to improve memory access utilization.Combined with graphics processor memory access,due to the dense features,under the premise of improving the prefetch performance,reducing the influence on the normal efficiency of the graphics pipeline has become a popular research direction.Based on a graphics processor lossless compression architecture,this paper proposes a set of graphics processor prefetch architecture design.The design of the prefetch architecture can effectively improve the memory access utilization in the memory-intensive graphics pipeline,and does not affect the efficiency of the current graphics pipeline.The experimental results show that on the Godson graphic processing unit(GSGPU)graphics processor platform,compared with the traditional prefetch architecture,the cache hit rate can be increased by more than 15% for the memory-intensive test program.For the test program with idle memory,it will not have a negative impact on the pipeline.

作者赵士彭张立志章隆兵 ZHAO Shipeng;ZHANG Lizhi;ZHANG Longbing(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)

机构地区计算机体系结构国家重点实验室(中国科学院计算技术研究所) 中国科学院计算技术研究所中国科学院大学

出处《高技术通讯》 CAS 2022年第4期351-357,共7页 Chinese High Technology Letters

基金国家自然科学基金(61521092,61432016) 中国科学院重点部署项目(ZDRW-XH-2017-1)资助。

关键词图形处理器(GPU) 访存子系统预取结构压缩结构 graphic processing unit(GPU) memory access subsystem prefetch architecture compressed architecture

分类号 TP332 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献4

1卢俊,颜哲,田泽.一种高效GPU存储系统体系架构设计[J].计算机技术与发展,2015,25(4):6-9. 被引量：7
2韩立敏,田泽,张骏,郑新建,任向隆.图形处理器流水线数据压缩技术研究综述[J].计算机应用研究,2018,35(3):648-653. 被引量：11
3张立志,赵士彭,赵皓宇,苏孟豪,刘苏.高性能GPU模拟器的实现[J].高技术通讯,2020,30(6):553-560. 被引量：6
4赵士彭,张立志,赵皓宇,苏孟豪,刘苏.高性能GPU模拟器驱动设计研究[J].高技术通讯,2020,30(5):435-442. 被引量：4

二级参考文献17

1王鹏,伊鹏,金德鹏,曾烈光.基于三级存储阵列缓存高速数据包及性能分析[J].软件学报,2005,16(12):2181-2189. 被引量：8
2蔡士杰,宋继强,蔡敏.计算机图形学[M].第3版.北京:电子工业出版社,2007:10-21. 被引量：3
3Wolf W. High performance embedded computing architectu- res, applications, and methodologies [ M ]. New York : Elsevier, 2007. 被引量：1
4Yoo Hoi-Jun,Woo Jeong-Ho. Mobile 3D graphics SoC from algorithm to chip [ M ]. Republic of Korea:John Wiley & Sons (Asia) Pie Lid,2009,. 被引量：1
5Lindholm E, Nickolls J, Oberman S, et al. NVIDIA Tesla : a u- nified graphics and computing architecture [ J ]. IEEE Micro, 2008,28 (2) :39-55. 被引量：1
6Martin M. Token coherence [D]. Wisconsin : University of Wisconsin-Madison, 2003. 被引量：1
7Johansson M. General purpose computing on graphics process- ing units using OpenCL[ D ]. Sweden: Chalmers University of Technology ,2010. 被引量：1
8Woo R, Choi S, Sohn Ju-Ho, et al. A low-power 3D rendering engine with two texture units and 29Mb embedded DRAM for 3D multimedia tenninals[J]. IEEE Journal of Solid-state Cir- cuits,2004.39(7) :1101-1109. 被引量：1
9Elder G. ATI Radeon 9700:architecture and 3D performance [ C ]//Proc of ACM SIGGRAPH/Eurographics. [ s. 1. ] : ACM ,2002:86-92. 被引量：1
10Gareia J, March M, Cerda L, et al. On the design of hybrid DRAM,/SRAM memory schemes for fast packet buffers [ C ]// Proc of HPSR. [ s. 1. ] : IEEE Computer Society,2004 : 15-19. 被引量：1

共引文献20

1吴迪,饶靖雯,万磊.基于UE4的汽车座椅生产线虚拟仿真系统设计[J].数字制造科学,2023(1):21-25. 被引量：1
2刘晖,田泽,张琛,苏东阁.图形命令大规模并行解析程序设计方法[J].计算机应用研究,2020,37(S02):214-215.
3刘晖,田泽,马城城,张骏,薛凌艺.图形处理显示列表的设计与实现[J].计算机技术与发展,2016,26(4):119-122. 被引量：1
4邓艺,田泽,韩立敏.统一着色架构3D引擎关键技术研究[J].计算机技术与发展,2018,28(6):170-173.
5张骏,田泽,郭亮,郑斐,韩立敏,任向隆.面向GPU统一染色阵列的并行自适应看门狗[J].航空计算技术,2018,48(5):187-193. 被引量：3
6魏国.矢量图形数据压缩技术的研究与应用[J].信息技术,2019,43(2):121-124.
7王婷,田泽,张骏,韩立敏.基于AXI总线的H.264解码器存储管理接口设计[J].航空计算技术,2019,49(3):92-94. 被引量：1
8任向隆,田泽,张骏,郑新建,韩立敏,王治,张亮,李哲,许宏杰,刘航,张宏伟.面向OpenGL 2.0的图形处理器图像处理单元体系结构[J].计算机辅助设计与图形学学报,2019,31(10):1858-1870. 被引量：2
9王可,杜慧敏,黄虎才,刘世豪,刘鑫.图形渲染管线中顶点索引压缩方法[J].计算机与数字工程,2019,47(11):2691-2695.
10魏艳艳,田泽,牛少平,史嘉涛.统一染色器阵列中取指译码单元的设计与实现[J].航空计算技术,2020,50(3):102-104.

1赵士彭,张立志,章隆兵.一种面向嵌入式图形处理器的访存子系统结构设计[J].高技术通讯,2022,32(2):152-160.
2沈梦萍,段然,张海燕,张来宇,马晓耘,刘飞,李菂.基于图形处理器的转变边缘传感器读取系统信号处理技术研究[J].北京师范大学学报（自然科学版）,2022,58(2):203-208. 被引量：3
3杨磊.包装设计中视觉语言的艺术化诠释[J].包装工程,2022,43(10):355-357. 被引量：6
4产业风暴[J].华东科技,2022(2):10-11.
5蒋佳铭,林霞.二进制减法器的设计与功能仿真[J].福建电脑,2022,38(6):83-86.
6张冉,李明周,钟立桦,童长仁,何发友,黄金堤.基于图像识别的铜转炉吹炼造渣期渣含Fe预测模型研究[J].有色金属（冶炼部分）,2022(4):21-30. 被引量：5
7张志超,王剑,章隆兵,肖俊华.基于软硬件协同加速的关系网络推理优化方法[J].高技术通讯,2022,32(4):327-336.

高技术通讯

2022年第4期

浏览历史

内容加载中请稍等...

一种基于图形处理器压缩结构的预取结构设计

参考文献4

二级参考文献17

共引文献20

相关作者

相关机构

相关主题

浏览历史