摘要
针对在H.264/AVC视频解码系统中子像素插值过程复杂度高的问题,提出一种子像素插值的2层流水线设计方法.第1层流水机制是当8×8分割块内部4个4×4块具有相同的运动信息时,基于4×4分割块参考像素读取和插值运算的两级流水,实现了不同4×4块插值过程的并行操作.第2层流水机制利用插值运算算法中1/2像素值之间的无依赖性以及水平和垂直插值运算过程的对称性,加速了各子像素位置处的像素插值运算过程.核心插值运算单元包括13个6阶滤波器、4个双线性插值滤波器和4个色度插值滤波器.插值运算过程的并行流水机制至少缩减了75%的插值运算时间.实验结果表明,与其他同领域工作相比,该架构设计的硬件开销较小,外部存储器访问量降低了47%,子像素插值性能提高了30%.
A two-level pipeline architecture was proposed in order to decrease the high complexity of sub-pixel interpolation process in H.264/AVC decoding system.The first level pipeline scheme was utilized to explore the parallelism for the interpolation processes of different 4×4 blocks with two stages of fetching 4×4 block's reference pixels and interpolation computation operation when the four 4×4 blocks inside one 8×8 block share the same motion information.The second level pipeline scheme was used to accelerate the sub-pixel interpolation computation operation of different pixels by using the independence of adjacent half-pixels and the symmetry between horizontal and vertical interpolation computation processes.The kernel interpolation computation unit was implemented with 13 six-tap filters,4 bilinear interpolation filters and 4 chroma interpolation filters.The pipelining and parallelism in interpolation computation process can reduce computation time by at least 75%.Experimental results show that the proposed architecture design can reduce the external memory bandwidth by 47% and improve the performance of sub-pixel interpolation by 30% at a lower hardware cost compared with other designs.
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2011年第7期1187-1193,共7页
Journal of Zhejiang University:Engineering Science