跳到主要导航 跳到搜索 跳到主要内容

Alignment Relation is What You Need for Diagram Parsing

  • Xinyu Zhang
  • , Lingling Zhang
  • , Xin Hu
  • , Jun Liu
  • , Shaowei Wang
  • , Qianying Wang
  • Xi'an Jiaotong University
  • Chang'an University
  • Lenovo

科研成果: 期刊稿件文章同行评审

3 引用 (Scopus)

摘要

As a knowledge carrier, the diagram is widely distributed in many aspects of human life, such as textbooks, architectural drawings, and documents. Different from natural images, representations of visual elements in the diagram are sparser, and similar visual representations can reflect dissimilar semantics. Thus, current methods fail to capture the visual elements with precise semantics. To address this issue, regarding the aligned visual and textual elements as pairs is the way to assign the precise semantics of textual elements to visual elements. We build the first diagram dataset named align diagram element (ADE), which includes annotations for alignment relations between visual and textual elements. And we propose a visual-textual alignment model (VTAM) including graph construction and optimal aligning phases. In the graph construction phase, the relational graphs are constructed between different elements with four relational operators. The relational operators are designed to measure the relations between different elements, according to distance, connection line, inclusion, and feature similarity. In the optimal aligning phase, the representation at each visual-textual pair is improved as a weighted sum of the representations on all relational graphs. Experimental results show that our VTAM achieves a significant improvement of 10.9% on mean test folds of the ADE dataset than the current best competitor. In order to explore the role of alignment relations in diagram parsing, we introduce VTAM to diagram-related tasks, such as diagram question answering (DQA). And we achieve 2.8% to 5.9% and 4.6% to 5.1% improvements on AI2D and Foodwebs after adding VTAM. Our dataset and code are released at: https://github.com/ADE-dataset/ADE-dataset.

源语言英语
页(从-至)2131-2144
页数14
期刊IEEE Transactions on Image Processing
33
DOI
出版状态已出版 - 2024

学术指纹

探究 'Alignment Relation is What You Need for Diagram Parsing' 的科研主题。它们共同构成独一无二的指纹。

引用此