MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning

Research output: Contribution to journalConference articlepeer-review

Abstract

Recently, a state-of-the-art series of algorithms-Goal-Conditioned Weighted Supervised Learning (GCWSL) methods-has been introduced to address the challenges inherent in offline goal-conditioned reinforcement learning (RL). GCWSL optimizes a lower bound on the goal-conditioned RL objective and has demonstrated exceptional performance across a range of goal-reaching tasks, offering a simple, effective, and stable solution. Nonetheless, researches has revealed a critical limitation in GCWSL: the absence of trajectory stitching capabilities. In response, goal data augmentation strategies have been proposed to enhance these methods. However, existing techniques often fail to effectively sample appropriate augmented goals for GCWSL. In this paper, we establish unified principles for goal data augmentation, emphasizing goal diversity, action optimality, and goal reach-ability. Building on these principles, we propose a Modelbased Goal Data Augmentation (MGDA) approach, which leverages a dynamics model to sample more appropriate augmented goals. MGDA uniquely incorporates the local Lipschitz continuity assumption within the learned model to mitigate the effects of compounding errors. Empirical results demonstrate that MGDA significantly improves the performance of GCWSL methods on both state-based and vision-based maze datasets, outperforming previous goal data augmentation techniques in their ability to enhancing stitching capabilities.

Original languageEnglish
Pages (from-to)18172-18180
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number17
DOIs
StatePublished - 11 Apr 2025
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

Fingerprint

Dive into the research topics of 'MGDA: Model-based Goal Data Augmentation for Offline Goal-conditioned Weighted Supervised Learning'. Together they form a unique fingerprint.

Cite this