Abstract
Functional Magnetic Resonance Imaging (fMRI) presents challenges due to limited paired samples and low signal-to-noise ratios, particularly in tasks involving reconstructing natural images or decoding their semantic content. To address these challenges, we introduce BrainCLIP, an innovative fMRI-based brain decoding model. BrainCLIP leverages Contrastive Language-Image Pre-training’s (CLIP) cross-modal generalization abilities to bridge brain activity, images, and text for the first time. Our experiments demonstrate CLIP’s effectiveness in diverse brain decoding tasks, including zero-shot visual category decoding, fMRI-image/text alignment, and fMRI-to-image generation. The core objective of BrainCLIP is to train a mapping network that translates fMRI patterns into a unified CLIP embedding space, achieved through visual and textual supervision integration. Our experiments highlight that this approach significantly enhances performance in tasks such as fMRI-text alignment and fMRI-based image generation. Notably, BrainCLIP surpasses BraVL, a recent multi-modal method, in zero-shot visual category decoding. Moreover, BrainCLIP demonstrates strong capability in reconstructing visual stimuli with high semantic fidelity, competing favorably with state-of-the-art methods in capturing high-level semantic features during fMRI-based natural image reconstruction.
| Original language | English |
|---|---|
| Pages (from-to) | 3962-3972 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Medical Imaging |
| Volume | 44 |
| Issue number | 10 |
| DOIs | |
| State | Published - Oct 2025 |
Keywords
- Brain decoding
- CLIP
- cross-modal
- visual-linguistic representation
Fingerprint
Dive into the research topics of 'BrainCLIP: Brain Representation via CLIP for Generic Natural Visual Stimulus Decoding'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver