期刊文献+

基于自监督图对比学习的视频问答方法

Self-supervised Graph Contrastive Learning for Video Question Answering
下载PDF
导出
摘要 视频问答作为一种跨模态理解任务,在给定一段视频和与之相关的问题的条件下,需要通过不同模态语义信息之间的交互来产生问题的答案.近年来,由于图神经网络在跨模态信息融合与推理方面强大的能力,其在视频问答任务中取得了显著的进展.但是,大多数现有的图网络方法由于自身固有的过拟合或过平滑、弱鲁棒性和弱泛化性的缺陷使得视频问答模型的性能未能进一步提升.鉴于预训练技术中自监督对比学习方法的有效性和鲁棒性,在视频问答任务中利用图数据增强的思路提出了一种图网络自监督对比学习框架GMC.该框架使用针对节点和边的两种数据增强操作来生成相异子样本,并通过提升原样本与生成子样本图数据预测分布之间的一致性来提高视频问答模型的准确率和鲁棒性.在视频问答公开数据集上通过与现有先进的视频问答模型和不同GMC变体模型的实验对比验证了所提框架的有效性. As a cross-modal understanding task,video question answering(VideoQA)requires the interaction of semantic information with different modalities to generate answers to questions given a video and the questions associated with it.In recent years,graph neural networks(GNNs)have made remarkable progress in VideoQA tasks due to their powerful capabilities in cross-modal information fusion and inference.However,most existing GNN approaches fail to improve the performance of VideoQA models due to their inherent deficiencies of overfitting or over-smoothing,as well as weak robustness and generalization.In view of the effectiveness and robustness of self-supervised contrastive learning methods in pre-training techniques,this study proposes a self-supervised graph contrastive learning framework GMC based on the idea of graph data augmentation in VideoQA tasks.The framework uses two independent data augmentation operations for nodes and edges to generate dissimilar subsamples and improves the consistency between predicted graph data distributions of the original samples and augmented subsamples for higher accuracy and robustness of the VideoQA models.The effectiveness of the proposed framework is verified by experimental comparisons with existing state-of-the-art VideoQA models and different GMC variants on the public dataset for VideoQA tasks.
作者 姚暄 高君宇 徐常胜 YAO Xuan;GAO Jun-Yu;XU Chang-Sheng(National Laboratory of Pattern Recognition(Institute of Automation,Chinese Academy of Sciences),Beijing 100190,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100190,China;Pengcheng Laboratory,Shenzhen 518055,China)
出处 《软件学报》 EI CSCD 北大核心 2023年第5期2083-2100,共18页 Journal of Software
基金 科技创新2030-“新一代人工智能”重大项目(2020AAA0106200) 国家自然科学基金(62036012,U21B2044,62102415,62072286,61721004) 之江实验室开放课题(2022RC0AB02) CCF-海康威视“斑头雁”基金(20210004)。
关键词 图对比学习 视频问答 图数据增强 预训练 graph contrastive learning video question answering graph data augmentation pre-training
  • 相关文献

参考文献6

  • 1张博伦..基于注意力机制与图卷积网络的视频问答研究[D].哈尔滨理工大学,2021:
  • 2薛东辉..基于卷积神经网络的道路风险目标检测模型研究与应用[D].南京邮电大学,2021:
  • 3陶超,阴紫薇,朱庆,李海峰.遥感影像智能解译:从监督学习到自监督学习[J].测绘学报,2021,50(8):1122-1134. 被引量:28
  • 4权海波,杨颖.视觉问答语言先验性研究综述[J].信息与电脑,2022,34(1):55-58. 被引量:1
  • 5吴猛..基于深度记忆融合方法的视频问答研究[D].哈尔滨理工大学,2021:
  • 6陈学信..面向链接预测的图卷积神经网络算法研究[D].广东工业大学,2021:

二级参考文献12

共引文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部