COSDA: Covariance regularized semantic data augmentation for self-supervised visual representation learning

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Recent contrastive learning-based self-supervised learning has seen significant improvements through employing an extensive data augmentation strategy, particularly focusing on the generation of positive pairs. However, the current techniques primarily operate at the pixel level, confined to basic spatial and color transformations, thus lacking the capability to incorporate more complex semantic alterations such as object repositioning, rotation, or color modification within the image. Consequently, the resultant positive pairs are less informative for learning features that are invariant to such semantic variations. In this work, we introduce a new methodology termed COvariance Regularized Semantic Data Augmentation (COSDA), designed to generate a diverse collection of feature embeddings that serve as positives relative to an anchor point. These generated features are intended to possess distinct semantic characteristics from the anchor point while maintaining consistent category identities, accomplished through Gaussian sampling in the deep feature space. By theoretically analyzing the scenario where the number of generated positive features approaches infinity, we establish an upper bound for the InfoNCE loss and optimize this bound without explicit feature generation. Rigorous experimental assessments, conducted on datasets of varying scales, alongside downstream tasks encompassing detection and segmentation, corroborate the efficacy of COSDA.

Original languageEnglish
Article number113080
JournalKnowledge-Based Systems
Volume311
DOIs
StatePublished - 28 Feb 2025

Keywords

  • Contrastive learning
  • Self-supervised visual representation learning
  • Semantic data augmentation

Fingerprint

Dive into the research topics of 'COSDA: Covariance regularized semantic data augmentation for self-supervised visual representation learning'. Together they form a unique fingerprint.

Cite this