Skip to main navigation Skip to search Skip to main content

CoCoSoDa: Effective Contrastive Learning for Code Search

  • Ensheng Shi
  • , Yanlin Wang
  • , Wenchao Gu
  • , Lun Du
  • , Hongyu Zhang
  • , Shi Han
  • , Dongmei Zhang
  • , Hongbin Sun
  • Xi'an Jiaotong University
  • Sun Yat-Sen University
  • Chinese University of Hong Kong
  • Microsoft USA
  • Chongqing University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

52 Scopus citations

Abstract

Code search aims to retrieve semantically relevant code snippets for a given natural language query. Recently, many approaches employing contrastive learning have shown promising results on code representation learning and greatly improved the performance of code search. However, there is still a lot of room for improvement in using contrastive learning for code search. In this paper, we propose CoCoSoDa to effectively utilize contrastive learning for code search via two key factors in contrastive learning: data augmentation and negative samples. Specifically, soft data augmentation is to dynamically masking or replacing some tokens with their types for input sequences to generate positive samples. Momentum mechanism is used to generate large and consistent representations of negative samples in a mini-batch through maintaining a queue and a momentum encoder. In addition, multimodal contrastive learning is used to pull together representations of code-query pairs and push apart the unpaired code snippets and queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. Experimental results show that: (1) CoCoSoDa outperforms 18 baselines and especially exceeds CodeBERT, GraphCodeBERT, and UniXcoder by 13.3%, 10.5%, and 5.9% on average MRR scores, respectively. (2) The ablation studies show the effectiveness of each component of our approach. (3) We adapt our techniques to several different pre-trained models such as RoBERTa, CodeBERT, and GraphCodeBERT and observe a significant boost in their performance in code search. (4) Our model performs robustly under different hyper-parameters. Furthermore, we perform qualitative and quantitative analyses to explore reasons behind the good performance of our model.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/ACM 45th International Conference on Software Engineering, ICSE 2023
PublisherIEEE Computer Society
Pages2198-2210
Number of pages13
ISBN (Electronic)9781665457019
DOIs
StatePublished - 26 Jul 2023
Event45th IEEE/ACM International Conference on Software Engineering, ICSE 2023 - Melbourne, Australia
Duration: 15 May 202316 May 2023

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference45th IEEE/ACM International Conference on Software Engineering, ICSE 2023
Country/TerritoryAustralia
CityMelbourne
Period15/05/2316/05/23

Keywords

  • code search
  • contrastive learning
  • momentum mechanism
  • soft data augmentation

Fingerprint

Dive into the research topics of 'CoCoSoDa: Effective Contrastive Learning for Code Search'. Together they form a unique fingerprint.

Cite this