摘要
3D梯度向量流场(3DGVF field)广泛应用于多种3D图像分析算法中,其计算需要多次迭代,计算量大,如何提高其计算速度具有重要的研究意义。面向Intel Xeon Phi众核集成架构,首次进行了3DGVF场计算的加速优化。首先,挖掘3D图像像素点间存在的天然并行性,发挥众核架构优势,尝试线程级并行(多核)和数据级并行(SIMD)。其次,3DGVF场的计算过程是一种典型的3D-7点模板运算,结合Xeon Phi架构的L2缓存规格,提出一种高效的数据分块策略,充分挖掘数据的时/空局部性,有效缓解模板计算引起的缓存缺失,提升了计算性能。实验结果表明,引入模板优化技术能显著提升3D GVF场的计算速度,在图像维度为5123时,所提方法在57核Xeon Phi平台上的性能相比在2.6GHz 8核16线程的Intel Xeon E5-2670CPU上的性能,加速比可达2.77。
3D Gradient Vector Flow (GVF) field has wide applications in many image processing al gorithms.The computation of GVF field typically needs several iterations and is rather time consuming.Therefore,it is important and meaningful to improve the computation speed of 3D GVF field.The data level parallelism and thread level parallelism are introduced to accelerate the GVF field computation pro cedure on Intel Xeon Phi many core integrated platform for the first time.Meanwhile,GVF field compu tation is a kind of stencil computation,whose computation-memory access ratio is low.A novel cache blocking strategy is proposed to fully utilize the L2 cache of Xeon Phi architecture,and to improve the computation speed of GVF field.The experimental results show that the proposed optimizations could effectively improve the speed of GVF filed computation.Especially,for a 5123 3D image,compared with the performance obtained by a 2.6G Hz 8 core 16threads Intel Xeon E5-2670 CPU,the speedup achieved on Xeon Phi is 2.77X.
出处
《计算机工程与科学》
CSCD
北大核心
2014年第8期1435-1440,共6页
Computer Engineering & Science
基金
国家863计划资助项目(2012AA010903)
国家自然科学基金资助项目(61170049
61303189)