Abstract
Textbook Question Answering (TQA) is a complex multimodal task to infer answers given large context descriptions and abundant diagrams. Compared with Visual Question Answering (VQA), TQA contains a large number of uncommon terminologies and various diagram inputs. It brings new challenges to the representation capability of language model for domain-specific spans. Also, it requires the model to take fully advantage of the complementary information of different diagram types, which pushes the multimodal fusion task to a more complex level. To tackle the above issues, we propose a novel model named MoCA, which incorporates Multi-stage domain pretraining and Cross-guided multimodal Attention for the TQA task. Firstly, we introduce a multi-stage domain pretraining module to conduct unsupervised post-pretraining with a span mask strategy and supervised pre-finetune. Especially for domain post-pretraining, we propose a heuristic generation algorithm to employ the terminology corpus. Secondly, to fully consider the rich inputs of context and diagrams, we propose a cross-guided multimodal attention mechanism to update the features of text, question diagram and instructional diagram based on a progressive strategy. Further, a dual gating mechanism is adopted to improve the model ensemble of three background retrievals. The experimental results show the superiority of our model, which outperforms the state-of-the-art methods on the validation and test split respectively. Also, ablation and comparison experiments verify the effectiveness of each module proposed in our model.
| Original language | English |
|---|---|
| Article number | 109588 |
| Journal | Pattern Recognition |
| Volume | 140 |
| DOIs | |
| State | Published - Aug 2023 |
Keywords
- Attention
- Multimodal
- Pretraining
- Textbook question answering
Fingerprint
Dive into the research topics of 'MoCA: Incorporating domain pretraining and cross attention for textbook question answering'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver