Skip to main navigation Skip to search Skip to main content

Knowledge Graph Enhancement for Fine-Grained Zero-Shot Learning on ImageNet21K

  • Xi'an Jiaotong University

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Fine-grained Zero-shot Learning on the large-scale dataset ImageNet21K is an important task that has promising perspectives in many real-world scenarios. One typical solution is to explicitly model the knowledge passing using a Knowledge Graph (KG) to transfer knowledge from seen to unseen instances. By analyzing the hierarchical structure and the word descriptions on ImageNet21K, we find that the noisy semantic information, the sparseness of seen classes, and the lack of supervision of unseen classes make the knowledge passing insufficient, which limits the KG-based fine-grained ZSL. To resolve this problem, in this paper, we enhance the knowledge passing from three aspects. First, we use more powerful models such as the Large Language Model and Vision-Language Model to get more reliable semantic embeddings. Then we propose a strategy that globally enhances the knowledge graph based on the convex combination relationship of the semantic embeddings. It effectively connects the edges between the non-kinship seen and unseen classes that have strong correlations while assigning an importance score to each edge. Based on the enhanced knowledge graph, we further present a novel regularizer that locally enhances the knowledge passing during training. We extensively conducted comparative evaluations to demonstrate the advantages of our method over state-of-the-art approaches.

Original languageEnglish
Pages (from-to)9090-9101
Number of pages12
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume34
Issue number10
DOIs
StatePublished - 2024

Keywords

  • Fine-grained zero-shot learning
  • graph convolutional neural network
  • knowledge graph

Fingerprint

Dive into the research topics of 'Knowledge Graph Enhancement for Fine-Grained Zero-Shot Learning on ImageNet21K'. Together they form a unique fingerprint.

Cite this