跳到主要导航 跳到搜索 跳到主要内容

Spatial-Semantic Collaborative Graph Network for Textbook Question Answering

  • Yaxian Wang
  • , Bifan Wei
  • , Jun Liu
  • , Qika Lin
  • , Lingling Zhang
  • , Yaqiang Wu
  • Xi'an Jiaotong University
  • Lenovo

科研成果: 期刊稿件文章同行评审

11 引用 (Scopus)

摘要

Textbook Question Answering (TQA) task requires answering questions by reasoning based on both the given diagrams and text context. There are mainly two challenges for the task. First, the diagrams are different from the natural images. Similar shapes or color blocks may express different semantics and there is also a large intra-topic variation for diagrams. Hence, the characteristics of visual semantic ambiguity and variable visual appearance make the diagram understanding more challenging. Second, for the text, the specific education domain with terminologies exists a great gap with the general domain. Therefore, it is difficult to represent the text semantics effectively using a text encoder pretrained in the general domain. In this paper, we propose a Spatial-Semantic Collaborative Graph Network (SSCGN) for TQA task, which can help enhance the diagram and text understanding and facilitate multimodal reasoning. Specifically, the Spatial-guided Semantic Enhancing (SSE) module fully exploits the spatial and semantic relationships between visual objects and OCR tokens to collaboratively enhance the diagram semantic understanding. Moreover, based on the semantically enhanced region representations of the SSE module, the Fine-grained Spatial-Aware Graph Network (FSA-GN) can help obtain richer relation-aware region representations for joint reasoning by capturing more fine-grained spatial relationships. We further propose multiple self-supervised auxiliary tasks to enhance the initial diagram and text semantic representations by pretraining the diagram encoder and text encoder. Extensive experiments and ablation studies are conducted to validate the effectiveness of SSCGN.

源语言英语
页(从-至)3214-3228
页数15
期刊IEEE Transactions on Circuits and Systems for Video Technology
33
7
DOI
出版状态已出版 - 1 7月 2023

学术指纹

探究 'Spatial-Semantic Collaborative Graph Network for Textbook Question Answering' 的科研主题。它们共同构成独一无二的指纹。

引用此