TY - JOUR
T1 - Fs-DSM
T2 - Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model
AU - Hu, Xin
AU - Zhang, Lingling
AU - Liu, Jun
AU - Zheng, Qinghua
AU - Zhou, Jianlong
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2021
Y1 - 2021
N2 - Diagram-sentence matching is a valuable academic research because it can help learners effectively understand the diagrams with the assisted by sentences. However, there are many uncommon objects, i.e. few-shot contents in diagrams and sentences. The existing methods for image-sentence matching have great limitations when applied to diagrams. Because they focus on the high-frequency objects during training and ignore the uncommon objects. In addition, the specialty leads to the semantic non-intuition of the diagram itself. In this work, we propose a cross-modal attention graph model for the few-shot diagram-sentence matching task named Fs-DSM. Specifically, it is composed of three modules. The graph initialization module regards the region-level diagram features and word-level sentence features as the nodes of Fs-DSM, and edges are represented as similarity between nodes. The information propagation module is a key point of Fs-DSM, in which the few-shot contents are recognized by an uncommon object recognition strategy, and then the nodes are updated by a neighborhood aggregation procedure with cross-modal propagation between all visual and textual nodes, while the edges are recomputed based on the new node features. The global association module integrates the features of regions and words to represent the global diagrams and sentences. By conducting comprehensive experiments in terms of few-shot and conventional image-sentence matching, we demonstrate that Fs-DSM achieves superior performances over the competitors on the AI2D sharp diagram dataset and two public benchmark datasets with nature images.
AB - Diagram-sentence matching is a valuable academic research because it can help learners effectively understand the diagrams with the assisted by sentences. However, there are many uncommon objects, i.e. few-shot contents in diagrams and sentences. The existing methods for image-sentence matching have great limitations when applied to diagrams. Because they focus on the high-frequency objects during training and ignore the uncommon objects. In addition, the specialty leads to the semantic non-intuition of the diagram itself. In this work, we propose a cross-modal attention graph model for the few-shot diagram-sentence matching task named Fs-DSM. Specifically, it is composed of three modules. The graph initialization module regards the region-level diagram features and word-level sentence features as the nodes of Fs-DSM, and edges are represented as similarity between nodes. The information propagation module is a key point of Fs-DSM, in which the few-shot contents are recognized by an uncommon object recognition strategy, and then the nodes are updated by a neighborhood aggregation procedure with cross-modal propagation between all visual and textual nodes, while the edges are recomputed based on the new node features. The global association module integrates the features of regions and words to represent the global diagrams and sentences. By conducting comprehensive experiments in terms of few-shot and conventional image-sentence matching, we demonstrate that Fs-DSM achieves superior performances over the competitors on the AI2D sharp diagram dataset and two public benchmark datasets with nature images.
KW - attention
KW - Diagram understanding
KW - few-shot learning
KW - graph neural network
UR - https://www.scopus.com/pages/publications/85115803624
U2 - 10.1109/TIP.2021.3112294
DO - 10.1109/TIP.2021.3112294
M3 - 文章
C2 - 34554913
AN - SCOPUS:85115803624
SN - 1057-7149
VL - 30
SP - 8102
EP - 8115
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -