TY - JOUR
T1 - RepCo
T2 - Replenish sample views with better consistency for contrastive learning
AU - Lei, Xinyu
AU - Liu, Longjun
AU - Zhang, Yi
AU - Jia, Puhang
AU - Zhang, Haonan
AU - Zheng, Nanning
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/11
Y1 - 2023/11
N2 - Contrastive learning methods aim to learn shared representations by minimizing distances between positive pairs, and maximizing distances between negative pairs in the embedding space. To achieve better performance of contrastive learning, one of the key problems is to design appropriate sample pairs. In most previous works, random cropping on the input image is utilized to obtain two views as positive pairs. However, such strategies lead to suboptimal performance since the sampled crops may have inconsistent semantic information, which consequently degrades the quality of contrastive views. To address this limitation, we explore to replenish sample views with better consistency of the image and propose a novel self-supervised learning (SSL) framework RepCo. Instead of searching for semantically consistent patches between two different views, we select patches on the same image as the replenishment of positive/negative pairs, encourage patches that are similar but come from different positions as positive pairs, and force patches that are dissimilar but come from adjacent positions to have different representations, i.e. construct negative pairs to enrich the learned representations. Our method effectively generates high-quality contrastive views, explores the untapped semantic consistency on images, and provides more informative representations for downstream tasks. Experiments on adequate downstream tasks have shown that, our approach achieves +2.1 AP50 (COCO pre-trained) and +1.6 AP50 (ImageNet pre-trained) gains on Pascal VOC object detection, +2.3 mIoU gains on Cityscapes semantic segmentation, respectively.
AB - Contrastive learning methods aim to learn shared representations by minimizing distances between positive pairs, and maximizing distances between negative pairs in the embedding space. To achieve better performance of contrastive learning, one of the key problems is to design appropriate sample pairs. In most previous works, random cropping on the input image is utilized to obtain two views as positive pairs. However, such strategies lead to suboptimal performance since the sampled crops may have inconsistent semantic information, which consequently degrades the quality of contrastive views. To address this limitation, we explore to replenish sample views with better consistency of the image and propose a novel self-supervised learning (SSL) framework RepCo. Instead of searching for semantically consistent patches between two different views, we select patches on the same image as the replenishment of positive/negative pairs, encourage patches that are similar but come from different positions as positive pairs, and force patches that are dissimilar but come from adjacent positions to have different representations, i.e. construct negative pairs to enrich the learned representations. Our method effectively generates high-quality contrastive views, explores the untapped semantic consistency on images, and provides more informative representations for downstream tasks. Experiments on adequate downstream tasks have shown that, our approach achieves +2.1 AP50 (COCO pre-trained) and +1.6 AP50 (ImageNet pre-trained) gains on Pascal VOC object detection, +2.3 mIoU gains on Cityscapes semantic segmentation, respectively.
KW - Contrastive learning
KW - Sampling strategy
KW - Self-supervised pretraining
UR - https://www.scopus.com/pages/publications/85172343737
U2 - 10.1016/j.neunet.2023.09.004
DO - 10.1016/j.neunet.2023.09.004
M3 - 文章
C2 - 37757725
AN - SCOPUS:85172343737
SN - 0893-6080
VL - 168
SP - 171
EP - 179
JO - Neural Networks
JF - Neural Networks
ER -