Fractional motion estimation(FME) improves the video encoding efficiency significantly. However, its high computational complexity limits the real-time processing capability. Therefore, it is a key problem to reduce t...Fractional motion estimation(FME) improves the video encoding efficiency significantly. However, its high computational complexity limits the real-time processing capability. Therefore, it is a key problem to reduce the implementation complexity of FME, especially in hardware design. This paper presents a novel deeply pipelined interpolation architecture of FME for the real-time realization of H.265/HEVC full Ultra-HD video encoder. First, a pipelined interpolation architecture together with an elegant processing order is proposed to deal with different search positions in parallel without pipeline stall and data conflict. Second, interpolation results sharing strategies are exploited among search positions to reduce the memory cost. Finally, the structure of the interpolation filter is further optimized for an area efficient implementation. As a result, the proposed design costs 41 917 slice LUTs on the Xilinx Kintex-7 FPGA platform with a 308 MHz working frequency. The measured throughput reaches a record of 1.238 Gpixels/s, which is sufficient for the real-time encoding of 8192×4320@ 30 fps video.展开更多
基金Supported by the Zhejiang Provincial Natural Science Foundation of China(No.LQ15F010001,LY16F020029)the General Research Project of Zhejiang Provincial Education Department(No.Y201430479)
文摘Fractional motion estimation(FME) improves the video encoding efficiency significantly. However, its high computational complexity limits the real-time processing capability. Therefore, it is a key problem to reduce the implementation complexity of FME, especially in hardware design. This paper presents a novel deeply pipelined interpolation architecture of FME for the real-time realization of H.265/HEVC full Ultra-HD video encoder. First, a pipelined interpolation architecture together with an elegant processing order is proposed to deal with different search positions in parallel without pipeline stall and data conflict. Second, interpolation results sharing strategies are exploited among search positions to reduce the memory cost. Finally, the structure of the interpolation filter is further optimized for an area efficient implementation. As a result, the proposed design costs 41 917 slice LUTs on the Xilinx Kintex-7 FPGA platform with a 308 MHz working frequency. The measured throughput reaches a record of 1.238 Gpixels/s, which is sufficient for the real-time encoding of 8192×4320@ 30 fps video.