LES-CLIP: A Lightweight Emotion-Sensitive Adaptation of CLIP for Precise Similar Emotion Discrimination

  • Xiao Fu
  • , Pengyu Wang
  • , Wei Xi
  • , Kun Zhao
  • , Jiadong Feng
  • , Jizhong Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

CLIP has been widely adopted in affective computing for its strong vision-language representation capabilities. However, it fails to accurately distinguish visually similar yet label-distinct facial expressions. This limitation is rooted in CLIP's encoding paradigm and large-scale contrastive pretraining, which bias the model toward focusing primarily on globally salient visual features and aligning them with broad semantic concepts. Such alignment overlooks subtle facial variations and induces representational shortcuts, where emotionally distinct categories are projected into overlapping regions of the shared semantic space. This semantic entanglement severely compromises the model's ability to preserve emotional separability. We propose LES-CLIP, a Lightweight and Emotion-Sensitive framework that adapts CLIP for precise discrimination of similar emotions. LES-CLIP achieves fine-grained emotional sensitivity using only simple text prompts and facial images. It introduces three novel components: 1) an Emotion-Sensitive Adaptive Mixture-of-Experts, which pre-adapts representations for subtle expression discrimination; 2) a Prompt-Guided Emotion Discrimination module that activates CLIP's visual sensitivity to fine-grained facial cues; and 3) a LES hybrid loss that guides contrastive learning toward accurate emotion-label alignment. Extensive experiments demonstrate that LES-CLIP achieves state-of-the-art performance, reaching 70.18% on the 8-class AffectNet dataset. Moreover, it converges faster and requires significantly fewer parameters.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages5765-5774
Number of pages10
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • clip
  • contrastive learning
  • emotion discrimination
  • emotion-sensitive
  • facial expression recognition
  • lightweight adaptation

Fingerprint

Dive into the research topics of 'LES-CLIP: A Lightweight Emotion-Sensitive Adaptation of CLIP for Precise Similar Emotion Discrimination'. Together they form a unique fingerprint.

Cite this