跳到主要导航 跳到搜索 跳到主要内容

LANDMARK: language-guided representation enhancement framework for scene graph generation

  • Southeast University, Nanjing

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and the long-tail problem. Recently, various unbiased strategies have been proposed by designing novel loss functions and data balancing strategies. Unfortunately, these unbiased methods fail to emphasize language priors in the feature refinement perspective. Inspired by the fact that predicates are highly correlated with semantics hidden in subject-object pair and global context, we propose LANDMARK (LANguage-guiDed representation enhanceMent frAmewoRK) that learns predicate-relevant representations from language-vision interactive patterns, global language context, and object-predicate correlation. Specifically, we first project object labels to three distinctive semantic embeddings for different representation learning. Then, Language Attention Module (LAM) and Experience Estimation Module (EEM) processes subject-object word embeddings to attention vector and predicate distribution, respectively. Language Context Module (LCM) encodes global context from each word embedding, which avoids isolated learning from local information. Finally, module outputs are used to update visual representations and the SGG model’s prediction. All language representations are purely generated from object categories so that no extra knowledge is needed. This framework is model-agnostic and consistently improves performance on existing SGG models. Besides, representation-level unbiased strategies endow LANDMARK with compatibility of other methods. Code is available at https://github.com/rafa-cxg/PySGG-cxg .

源语言英语
页(从-至)26126-26138
页数13
期刊Applied Intelligence
53
21
DOI
出版状态已出版 - 11月 2023
已对外发布

学术指纹

探究 'LANDMARK: language-guided representation enhancement framework for scene graph generation' 的科研主题。它们共同构成独一无二的指纹。

引用此