三维高效视频编码(3D High Efficiency Video Coding,3D-HEVC)中视差估计算法存在处理数据量大、运算时间长和资源消耗大的问题,进一步提高算法执行效率对于3D-HEVC的推广应用具有十分重要的意义。在深入分析视差估计算法的并行性的基础...三维高效视频编码(3D High Efficiency Video Coding,3D-HEVC)中视差估计算法存在处理数据量大、运算时间长和资源消耗大的问题,进一步提高算法执行效率对于3D-HEVC的推广应用具有十分重要的意义。在深入分析视差估计算法的并行性的基础上,基于项目组开发的视频阵列处理器(DPR-CODEC),提出一种新的并行实现方案。在可重构阵列结构中完成了视差估计算法的并行映射、功能仿真和FPGA测试,显著减少了视差估计算法的执行时间。实验结果表明,所提出的并行实现方案相比于串行单PE执行时间节省了59%,基于可编程可重构阵列的并行实现在具有较高的执行效率的同时也具有较好的灵活性。展开更多
Transcendental functions are important functions in various high performance computing applications.Because these functions are time-consuming and the vector units on modern processors become wider and more scalable,t...Transcendental functions are important functions in various high performance computing applications.Because these functions are time-consuming and the vector units on modern processors become wider and more scalable,there is an increasing demand for developing and using vector transcendental functions in such performance-hungry applications.However,the performance of vector transcendental functions as well as their accuracy remain largely unexplored.To address this issue,we perform a comprehensive evaluation of two Single Instruction Multiple Data(SIMD)intrinsics based vector math libraries on two ARMv8 compatible processors.We first design dedicated microbenchmarks that help us understand the performance behavior of vector transcendental functions.Then,we propose a piecewise,quantitative evaluation method with a set of meaningful metrics to quantify their performance and accuracy.By analyzing the experimental results,we find that vector transcendental functions achieve good performance speedups thanks to the vectorization and algorithm optimization.Moreover,vector math libraries can replace scalar math libraries in many cases because of improved performance and satisfactory accuracy.Despite this,the implementations of vector math libraries are still immature,which means further optimization is needed,and our evaluation reveals feasible optimization solutions for future vector math libraries.展开更多
文摘三维高效视频编码(3D High Efficiency Video Coding,3D-HEVC)中视差估计算法存在处理数据量大、运算时间长和资源消耗大的问题,进一步提高算法执行效率对于3D-HEVC的推广应用具有十分重要的意义。在深入分析视差估计算法的并行性的基础上,基于项目组开发的视频阵列处理器(DPR-CODEC),提出一种新的并行实现方案。在可重构阵列结构中完成了视差估计算法的并行映射、功能仿真和FPGA测试,显著减少了视差估计算法的执行时间。实验结果表明,所提出的并行实现方案相比于串行单PE执行时间节省了59%,基于可编程可重构阵列的并行实现在具有较高的执行效率的同时也具有较好的灵活性。
基金supported by the National Key Research and Development Program of China under Grant No.2020YFA0709803the National Natural Science Foundation of China under Grant Nos.61902407 and 61802416.
文摘Transcendental functions are important functions in various high performance computing applications.Because these functions are time-consuming and the vector units on modern processors become wider and more scalable,there is an increasing demand for developing and using vector transcendental functions in such performance-hungry applications.However,the performance of vector transcendental functions as well as their accuracy remain largely unexplored.To address this issue,we perform a comprehensive evaluation of two Single Instruction Multiple Data(SIMD)intrinsics based vector math libraries on two ARMv8 compatible processors.We first design dedicated microbenchmarks that help us understand the performance behavior of vector transcendental functions.Then,we propose a piecewise,quantitative evaluation method with a set of meaningful metrics to quantify their performance and accuracy.By analyzing the experimental results,we find that vector transcendental functions achieve good performance speedups thanks to the vectorization and algorithm optimization.Moreover,vector math libraries can replace scalar math libraries in many cases because of improved performance and satisfactory accuracy.Despite this,the implementations of vector math libraries are still immature,which means further optimization is needed,and our evaluation reveals feasible optimization solutions for future vector math libraries.