Learning scalable Omni-scale distribution for crowd counting

Research output: Contribution to journalArticlepeer-review

Abstract

Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.

Original languageEnglish
Article number104387
JournalJournal of Visual Communication and Image Representation
Volume107
DOIs
StatePublished - Mar 2025

Keywords

  • Crowd counting
  • Density map estimation
  • Feature fusion
  • Omni-scale distribution

Fingerprint

Dive into the research topics of 'Learning scalable Omni-scale distribution for crowd counting'. Together they form a unique fingerprint.

Cite this