Skip to main navigation Skip to search Skip to main content

Density-aware and Depth-aware Visual Representation for Zero-Shot Object Counting

  • Fang Nan
  • , Feng Tian
  • , Ni Zhang
  • , Nian Liu
  • , Haonan Miao
  • , Guang Dai
  • , Mengmeng Wang
  • Xi'an Jiaotong University
  • Northwestern Polytechnical University Xian
  • Mohamed Bin Zayed University of Artificial Intelligence
  • State Grid Corporation of China
  • Zhejiang University of Technology

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Previous methods often utilize CLIP semantic classifiers with class names for zero-shot object counting. However, they ignore crucial density and depth knowledge for counting tasks. Thus, we propose a density-aware and depth-aware prompt counting model, which captures density information via learning density-aware prompts based on density-aware contrastive loss and incorporates depth guidance with predefined depth-aware prompts. To facilitate the training process, we design two strategies for standard counting loss and the contrastive loss, where the former prioritizes larger and sparser objects initially, gradually focusing on smaller and denser objects, and the latter adopts coarse-to-fine density learning. Besides, we construct a dataset named LVIS-372 with more real-world scenarios and balanced instance distribution compared to existing ones. Finally, the experimental results demonstrate the effectiveness of our proposed method.

Keywords

  • CLIP
  • Depth
  • Object Counting
  • Zero-shot

Fingerprint

Dive into the research topics of 'Density-aware and Depth-aware Visual Representation for Zero-Shot Object Counting'. Together they form a unique fingerprint.

Cite this