摘要
普通摄像设备拍摄的视频帧速率有限,从而影响观众的特殊观感体验,提高视频帧速率的后处理过程是必不可少的,视频插帧就是其中关键技术之一.视频插帧是指根据两个连续视频帧合成中间帧数据,在影视作品、体育比赛精彩视频片段慢动作回放等方面有广泛的应用.基于光流的视频插帧方法能有效解决视频中场景、目标的移动估计问题,但是其受制于光流估计的速度,无法很好地应用于实时视频任务.本文提出一种新的光流预测模型,并将其用于视频插帧任务中.首先对于输入的两张连续视频帧数据进行多次信息无损的下采样,获得不同尺度的输入数据;之后通过卷积神经网络进行特征提取,并对提取的特征建立注意力掩码,增强特征表达能力,根据该特征生成对应尺度的光流;最后使用融合网络,将多尺度的光流信息聚合为统一的尺度作为最终输出.本文方法能够被端到端的优化训练,并在大规模视频插帧基准数据集上进行了训练和验证测试.结果表明该方法能够获得高质量的插帧效果并能够达到实时的插帧速率,而且比其它先进方法更具优越性.
Due to the limited video frame rate,the common cameras often provide low QoE for special video play. It is necessary to do video post-processing,in which video frame interpolation is a key technology. Video frame interpolation refers to the synthesis of intermediate frame from two consecutive video frames,and it has many applications such as low-motion play of film and television shot,sports video highlight. The optical flow-based video interpolation methods can effectively solve the movement estimation of scene and target in video,but it is limited by the speed of optical flow estimation and cannot be well applied to real-time video tasks. This paper proposes a novel optical flow prediction model used in the video interpolation task. Firstly,it performs an information lossless downsampling with multiple scales for the two consecutive video frames to obtain the input data for the optical flow prediction network.Then it performs feature extraction by a convolutional neural network and produces attention masks to enhance the features,and generate optical flow at different scales according to the features. Finally,fusion network is used to aggregate the multi-scale optical flows into a unified scale as the final output. The proposed method can be optimized by end-to-end training,and is conducted training and validation on a large-scale video interpolation benchmark dataset. The results show that the method is able to obtain high quality interpolation results and achieve real-time interpolation rates. In addition,the proposed method outperforms other state-of-the-art methods.
作者
马境远
王川铭
MA Jing-yuan;WANG Chuan-ming(BUPT Sensing Technology Research Institute(Jiangsu)Co.,LTD,Wuxi 214115,China;Beijing Key Lab of Intelligent Telecommunication Software and Multimedia,Beijing University of Posts and Telecomm,Beijing 100876,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第12期2567-2571,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61872047)资助
北邮-传音“视觉感知与计算”联合实验室项目资助。
关键词
视频插帧
光流估计
端到端训练
特征融合
注意力机制
video frame interpolation
optical flow estimation
end-to-end training
feature fusion
attention mechanism