期刊文献+

Weakly supervised action anticipation without object annotations

原文传递
导出
摘要 Anticipating future actions without observing any partial videos of future actions plays an important role in action prediction and is also a challenging task.To obtain abundant information for action anticipation,some methods integrate multimodal contexts,including scene object labels.However,extensively labelling each frame in video datasets requires considerable effort.In this paper,we develop a weakly supervised method that integrates global motion and local finegrained features from current action videos to predict next action label without the need for specific scene context labels.Specifically,we extract diverse types of local features with weakly supervised learning,including object appearance and human pose representations without ground truth.Moreover,we construct a graph convolutional network for exploiting the inherent relationships of humans and objects under present incidents.We evaluate the proposed model on two datasets,the MPII-Cooking dataset and the EPIC-Kitchens dataset,and we demonstrate the generalizability and effectiveness of our approach for action anticipation.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第2期101-110,共10页 中国计算机科学前沿(英文版)
基金 supported partially by the National Natural Science Foundation of China(NSFC)(Grant Nos.U1911401 and U1811461) Guangdong NSF Project(2020B1515120085,2018B030312002) Guangzhou Research Project(201902010037) Research Projects of Zhejiang Lab(2019KD0AB03) the Key-Area Research and Development Program of Guangzhou(202007030004).
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部