Skip to main navigation Skip to search Skip to main content

InfMAE: A Foundation Model in the Infrared Modality

  • Fangcen Liu
  • , Chenqiang Gao
  • , Yaming Zhang
  • , Junjie Guo
  • , Jinghao Wang
  • , Deyu Meng
  • Chongqing University of Posts and Telecommunications
  • Sun Yat-Sen University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

In recent years, foundation models have swept the computer vision field, facilitating the advancement of various tasks within different modalities. However, effectively designing an infrared foundation model remains an open question. In this paper, we introduce InfMAE, a foundation model tailored specifically for the infrared modality. Initially, we present Inf30, an infrared dataset developed to mitigate the scarcity of large-scale data for self-supervised learning within the infrared vision community. Moreover, considering the intrinsic characteristics of infrared images, we design an information-aware masking strategy. It allows for a greater emphasis on the regions with richer information in infrared images during the self-supervised learning process, which is conducive to learning strong representations. Additionally, to enhance generalization capabilities in downstream tasks, we employ a multi-scale encoder for latent representation learning. Finally, we develop an infrared decoder to reconstruct images. Extensive experiments show that our proposed method InfMAE outperforms other supervised and self-supervised learning methods in three key downstream tasks: infrared image semantic segmentation, object detection, and small target detection.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
PublisherSpringer Science and Business Media Deutschland GmbH
Pages420-437
Number of pages18
ISBN (Print)9783031726484
DOIs
StatePublished - 2025
Event18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Duration: 29 Sep 20244 Oct 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15076 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th European Conference on Computer Vision, ECCV 2024
Country/TerritoryItaly
CityMilan
Period29/09/244/10/24

Keywords

  • Infrared Foundation Model
  • Infrared Image Segmentation
  • Infrared Object Detection
  • Infrared Small Target Detection

Fingerprint

Dive into the research topics of 'InfMAE: A Foundation Model in the Infrared Modality'. Together they form a unique fingerprint.

Cite this