基于Transformer紧凑编码的局部近重复视频检测算法

Partial Near-duplicate Video Detection Algorithm Based on Transformer Low-dimensional Compact Coding

下载PDF

导出

摘要针对现有局部近重复视频检测算法特征存储消耗大、整体查询效率低、提取特征时并未考虑近重复帧之间细微的语义差异等问题,文中提出了一种基于Transformer紧凑编码的局部近重复视频检测算法。首先,提出了一个基于Transformer的特征编码器,其学习了大量近重复帧之间细微的语义差异,可以在编码帧特征时对各个区域特征图引入自注意力机制,在有效降低帧特征维度的同时也提高了编码后特征的表示性。该特征编码器通过孪生网络训练得到,该网络不需要负样本就可以有效学习近重复帧之间的相似语义信息,因此无需沉重和困难的难负样本标注工作,使得训练过程更加简易和高效。其次,提出了一个基于视频自相似度矩阵的关键帧提取方法,可以从视频中提取丰富但不冗余的关键帧,从而使关键帧特征序列能够更全面地描述原视频内容,提升算法的性能,同时也大幅减少了存储和计算冗余关键帧带来的开销。最后,基于关键帧的低维紧凑编码特征,采用基于图网络的时间对齐算法,实现局部近重复视频片段的检测和定位。该算法在公开的局部近重复视频检测数据集VCDB上取得了优于现有算法的实验性能。 To address the issues of existing partial near-duplicate video detection algorithms,such as high storage consumption,low query efficiency,and feature extraction module that does not consider subtle semantic differences between near-duplicate frames,this paper proposes a partial near-duplicate video detection algorithm based on Transformer.First,a Transformer-based feature encoder is proposed,which canlearn subtle semantic differences between a large number of near-duplicate frames.The feature maps of frame regions are introduced with self-attention mechanism during frame feature encoding,effectively reducing the dimensionality of the feature while enhancing its representational capacity.The feature encoder is trained using a siamese network,which can effectively learn the semantic similarities between near-duplicate frames without negative samples.This eliminates the need for heavy and difficult negative sample annotation work,making the training process simpler and more efficient.Secondly,a key frame extraction method based on video self-similarity matrix is proposed.This method can extract rich,non-redundant key frames from the video,allowing for a more comprehensive description of the original video content and improved algorithm performance.Additionally,this approach significantly reduces the overhead associated with storing and computing redundant key frames.Finally,a graph network-based temporal alignment algorithm is used to detect and locate partial near-duplicate video clips based on the low-dimensional,compact encoded features of key frames.The proposed algorithm achieves impressive experimental results on the publicly available partial near-duplicate video detection dataset VCDB and outperforms existing algorithms.

作者王萍余圳煌鲁磊 WANG Ping;YU Zhenhuang;LU Lei(School of Information and Communication Engineering,Xi’an Jiaotong University,Xi’an 710049,China)

机构地区西安交通大学信息与通信工程学院

出处《计算机科学》 CSCD 北大核心 2024年第5期108-116,共9页 Computer Science

关键词局部近重复视频检测 TRANSFORMER 视频自相似度矩阵关键帧提取 Partial near-duplicate video detection Transformer Video self-similarity matrix Keyframe extraction

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1马晓娜,周海超.EHDE和WHO-SVM模型在齿轮箱故障诊断中的应用[J].机电工程,2024,41(4):622-632.
2董瑞.基于多源数据挖掘的河道综合治理风险精准预测研究[J].吉林水利,2024(5):47-51.
3付思远.基于语料库的词语辨析——以同义词“佩戴”“佩带”“配带”“配戴”为例[J].小说月刊（下半月）,2023(18):176-178.
4李龙,尹梁宇,种菲菲,童宁,黎娜,刘洁,余相江,王耀丽,许红霞.基于改进的机器学习模型对重症急性胰腺炎诊断的早期预测[J].陆军军医大学学报,2024,46(7):753-759. 被引量：1
5单昕昕,李凯,文颖.集成全尺度融合和循环注意力的医学图像分割网络[J].计算机科学,2024,51(5):100-107.
6王宇华,张宇琪,何俊飞,徐悦竹,崔环宇.TEB:GPU上矩阵分解重构的高效SpMV存储格式[J].计算机科学与探索,2024,18(4):1094-1108.
7安磊鑫,程志锋,郑威.重频组合脉冲干扰对雷达信号分选的影响分析[J].空天预警研究学报,2024,38(1):38-42.
8杨凯,何昱廷,李沃霖,祝志慧,王巧华.基于近红外光谱技术检测全蛋粉掺假[J].华中农业大学学报,2024,43(2):264-272.
9徐竟航,谢凝芳,颜迪.基于属性重加密方案的云存储数据安全共享研究[J].自动化与仪器仪表,2024(4):71-75.

计算机科学

2024年第5期

浏览历史

内容加载中请稍等...

基于Transformer紧凑编码的局部近重复视频检测算法

相关作者

相关机构

相关主题

浏览历史