BrainCLIP: Brain Representation via CLIP for Generic Natural Visual Stimulus Decoding

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Functional Magnetic Resonance Imaging (fMRI) presents challenges due to limited paired samples and low signal-to-noise ratios, particularly in tasks involving reconstructing natural images or decoding their semantic content. To address these challenges, we introduce BrainCLIP, an innovative fMRI-based brain decoding model. BrainCLIP leverages Contrastive Language-Image Pre-training’s (CLIP) cross-modal generalization abilities to bridge brain activity, images, and text for the first time. Our experiments demonstrate CLIP’s effectiveness in diverse brain decoding tasks, including zero-shot visual category decoding, fMRI-image/text alignment, and fMRI-to-image generation. The core objective of BrainCLIP is to train a mapping network that translates fMRI patterns into a unified CLIP embedding space, achieved through visual and textual supervision integration. Our experiments highlight that this approach significantly enhances performance in tasks such as fMRI-text alignment and fMRI-based image generation. Notably, BrainCLIP surpasses BraVL, a recent multi-modal method, in zero-shot visual category decoding. Moreover, BrainCLIP demonstrates strong capability in reconstructing visual stimuli with high semantic fidelity, competing favorably with state-of-the-art methods in capturing high-level semantic features during fMRI-based natural image reconstruction.

Original languageEnglish
Pages (from-to)3962-3972
Number of pages11
JournalIEEE Transactions on Medical Imaging
Volume44
Issue number10
DOIs
StatePublished - Oct 2025

Keywords

  • Brain decoding
  • CLIP
  • cross-modal
  • visual-linguistic representation

Fingerprint

Dive into the research topics of 'BrainCLIP: Brain Representation via CLIP for Generic Natural Visual Stimulus Decoding'. Together they form a unique fingerprint.

Cite this