B2-Sampling: Fusing Balanced and Biased Sampling for Graph Contrastive Learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Graph contrastive learning (GCL), aiming for an embedding space where semantically similar nodes are closer, has been widely applied in graph-structured data. Researchers have proposed many approaches to define positive and negative pairs (i.e., semantically similar and dissimilar pairs) on the graph, serving as labels to learn their embedding distances. Despite the effectiveness, those approaches usually suffer from two typical learning challenges. First, the number of candidate negative pairs is enormous. Thus, it is non-trivial to select representative ones to train the model in a more effective way. Second, the heuristics (e.g., graph views or meta-path patterns) to define positive and negative pairs are sometimes less reliable, causing considerable noise for both "labelled'' positive and negative pairs. In this work, we propose a novel sampling approach B2-Sampling to address the above challenges in a unified way. On the one hand, we use balanced sampling to select the most representative negative pairs regarding both the topological and embedding diversities. On the other hand, we use biased sampling to learn and correct the labels of the most error-prone negative pairs during the training. The balanced and biased samplings can be applied iteratively for discriminating and correcting training pairs, boosting the performance of GCL models. B2-Sampling is designed as a framework to support many known GCL models. Our extensive experiments on node classification, node clustering, and graph classification tasks show that B2-Sampling significantly improves the performance of GCL models with acceptable runtime overhead. Our website[11] https://sites.google.com/view/b2-sampling/home provides access to our codes and additional experiment results.

Original languageEnglish
Title of host publicationKDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1489-1500
Number of pages12
ISBN (Electronic)9798400701030
DOIs
StatePublished - 4 Aug 2023
Event29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023 - Long Beach, United States
Duration: 6 Aug 202310 Aug 2023

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN (Print)2154-817X

Conference

Conference29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023
Country/TerritoryUnited States
CityLong Beach
Period6/08/2310/08/23

Keywords

  • graph contrastive learning
  • negative sampling
  • neural network

Fingerprint

Dive into the research topics of 'B2-Sampling: Fusing Balanced and Biased Sampling for Graph Contrastive Learning'. Together they form a unique fingerprint.

Cite this