Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rath...Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete.For team sports videos such as volleyball and basketball videos,there are plenty of intra-team and inter-team relations.In this paper,a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos.To tackle this problem,a novel Hierarchical Relation Network is proposed.After all players in a video are finely divided into two teams,the feature of the two teams’activities and interactions will be enhanced by Graph Convolutional Networks,which are finally recognized to generate Group Scene Graph.For evaluation,built on Volleyball dataset with additional 9660 team activity labels,a Volleyball+dataset is proposed.A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method.Moreover,the idea of our method can be directly utilized in another video-based task,Group Activity Recognition.Experiments show the priority of our method and display the link between the two tasks.Finally,from the athlete’s view,we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’activities and provide professional gaming suggestions.展开更多
Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existin...Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation.To address these problems,we explore a method to convert the point clouds into structured data and generate graphs without given structures.Specifically,we cluster points with similar augmented features into groups and establish their relationships,resulting in an initial structural representation of the point cloud.Besides,we propose a Dynamic Graph Generation Network(DGGN)to judge the semantic labels of targets of different granularity.It dynamically splits and merges point groups,resulting in a scene graph with high precision.Experiments show that our methods outperform other baseline methods.They output reliable graphs describing the object-level relationships without additional manual labeled data.展开更多
场景图为描述图像内容的结构图(Graph),其在生成过程中存在两个问题:1)二步式场景图生成方法造成有益信息流失,使得任务难度提高;2)视觉关系长尾分布使得模型发生过拟合、关系推理错误率上升。针对这两个问题,文中提出结合多尺度特征图...场景图为描述图像内容的结构图(Graph),其在生成过程中存在两个问题:1)二步式场景图生成方法造成有益信息流失,使得任务难度提高;2)视觉关系长尾分布使得模型发生过拟合、关系推理错误率上升。针对这两个问题,文中提出结合多尺度特征图和环型关系推理的场景图生成模型SGiF(Scene Graph in Features)。首先,计算多尺度特征图上的每一特征点存在视觉关系的可能性,并将存在可能性高的特征点特征提取出来;然后,从被提取出的特征中解码得到主宾组合,根据解码结果的类别差异,对结果进行去重,以此得到场景图结构;最后,根据场景图结构检测包含目标关系边在内的环路,将环路上的其他边作为计算调整因子的输入,以该因子调整原关系推理结果,并最终完成场景图的生成。实验设置SGGen和PredCls作为验证项,在大型场景图生成数据集VG(Visual Genome)子集上的实验结果表明,通过使用多尺度特征图,相比二步式基线,SGiF的视觉关系检测命中率提升了7.1%,且通过使用环型关系推理,相比非环型关系推理基线,SGiF的关系推理命中率提升了2.18%,从而证明了SGiF的有效性。展开更多
基金National Natural Science Foundation of China(Grant No.U20B2069)Fundamental Research Funds for the Central Universities.
文摘Learning activities interactions between small groups is a key step in understanding team sports videos.Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete.For team sports videos such as volleyball and basketball videos,there are plenty of intra-team and inter-team relations.In this paper,a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos.To tackle this problem,a novel Hierarchical Relation Network is proposed.After all players in a video are finely divided into two teams,the feature of the two teams’activities and interactions will be enhanced by Graph Convolutional Networks,which are finally recognized to generate Group Scene Graph.For evaluation,built on Volleyball dataset with additional 9660 team activity labels,a Volleyball+dataset is proposed.A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method.Moreover,the idea of our method can be directly utilized in another video-based task,Group Activity Recognition.Experiments show the priority of our method and display the link between the two tasks.Finally,from the athlete’s view,we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’activities and provide professional gaming suggestions.
基金This work was supported by the National Natural Science Foundation of China(Nos.62173045 and 61673192)the Fundamental Research Funds for the Central Universities(No.2020XD-A04-2)the BUPT Excellent PhD Students Foundation(No.CX2021222).
文摘Scene graphs of point clouds help to understand object-level relationships in the 3D space.Most graph generation methods work on 2D structured data,which cannot be used for the 3D unstructured point cloud data.Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation.To address these problems,we explore a method to convert the point clouds into structured data and generate graphs without given structures.Specifically,we cluster points with similar augmented features into groups and establish their relationships,resulting in an initial structural representation of the point cloud.Besides,we propose a Dynamic Graph Generation Network(DGGN)to judge the semantic labels of targets of different granularity.It dynamically splits and merges point groups,resulting in a scene graph with high precision.Experiments show that our methods outperform other baseline methods.They output reliable graphs describing the object-level relationships without additional manual labeled data.
文摘场景图为描述图像内容的结构图(Graph),其在生成过程中存在两个问题:1)二步式场景图生成方法造成有益信息流失,使得任务难度提高;2)视觉关系长尾分布使得模型发生过拟合、关系推理错误率上升。针对这两个问题,文中提出结合多尺度特征图和环型关系推理的场景图生成模型SGiF(Scene Graph in Features)。首先,计算多尺度特征图上的每一特征点存在视觉关系的可能性,并将存在可能性高的特征点特征提取出来;然后,从被提取出的特征中解码得到主宾组合,根据解码结果的类别差异,对结果进行去重,以此得到场景图结构;最后,根据场景图结构检测包含目标关系边在内的环路,将环路上的其他边作为计算调整因子的输入,以该因子调整原关系推理结果,并最终完成场景图的生成。实验设置SGGen和PredCls作为验证项,在大型场景图生成数据集VG(Visual Genome)子集上的实验结果表明,通过使用多尺度特征图,相比二步式基线,SGiF的视觉关系检测命中率提升了7.1%,且通过使用环型关系推理,相比非环型关系推理基线,SGiF的关系推理命中率提升了2.18%,从而证明了SGiF的有效性。