摘要
3D视觉问答可以帮助人们理解空间信息,在幼儿教育等方面具有广阔的应用前景。3D场景信息复杂,现有方法大多直接进行回答,面对复杂问题时容易忽视上下文细节,从而导致性能下降。针对该问题,提出了一种基于子问题渐进式推理的3D视觉问答方法,通过文本分析为复杂的原始问题构建多个简单的子问题。模型在回答子问题的过程中学习上下文信息,帮助理解复杂问题的含义,最终利用积累的联合信息得出原始问题的答案。子问题与原始问题呈现渐近式推理关系,使得模型具有明确的错误解释性和可追溯性。在现有3D数据集ScanQA上进行的实验表明,所提方法在EM@10和CIDEr两个指标上分别达到了51.49%和61.68%,均超过了现有的其他3D视觉问答方法,证实了该方法的有效性。
3D visual question answering can help people understand spatial information,which has a broad application prospect in early childhood education.The 3D scene information is complex,and most of the existing methods answer directly.It is easy to ignore the context information in the scene when facing complex problems,which leads to the performance degradation.To address this problem,this paper proposed a 3D visual question answering method based on sub-question asymptotic reasoning,which constructed multiple simple sub-questions for complex original question through text analysis.The model learnt context information in the process of answering the sub-questions to help understand the meaning of the complex question,and finally used the accumulated joint information to derive the answers to the original question.The sub-questions pre-sented an asymptotic reasoning relationship with the original question,which made the model have explicit error interpretation and traceability.Experiments conducted on the ScanQA dataset show that,the proposed method achieves 51.49%and 61.68%for the two evaluation metrics EM@10 and CIDEr,both exceeding other existing methods,confirming the effectiveness of the method.
作者
李长健
杨昱威
肖枭
雷印杰
Li Changjian;Yang Yuwei;Xiao Xiao;Lei Yinjie(College of Electronics&Information Engineering,Sichuan University,Chengdu 610065,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第4期987-990,995,共5页
Application Research of Computers
基金
国家重点研发计划项目(2021YFC3300305)。
关键词
3D视觉问答
原始问题
子问题
渐进式推理
上下文信息
3D visual question answering
original question
sub-question
asymptotic reasoning
context information