Skip to main navigation Skip to search Skip to main content

Learning Adaptive Differential Evolution Algorithm from Optimization Experiences by Policy Gradient

  • Xi'an Jiaotong University
  • Leiden University

Research output: Contribution to journalArticlepeer-review

145 Scopus citations

Abstract

Differential evolution is one of the most prestigious population-based stochastic optimization algorithm for black-box problems. The performance of a differential evolution algorithm depends highly on its mutation and crossover strategy and associated control parameters. However, the determination process for the most suitable parameter setting is troublesome and time consuming. Adaptive control parameter methods that can adapt to problem landscape and optimization environment are more preferable than fixed parameter settings. This article proposes a novel adaptive parameter control approach based on learning from the optimization experiences over a set of problems. In the approach, the parameter control is modeled as a finite-horizon Markov decision process. A reinforcement learning algorithm, named policy gradient, is applied to learn an agent (i.e., parameter controller) that can provide the control parameters of a proposed differential evolution adaptively during the search procedure. The differential evolution algorithm based on the learned agent is compared against nine well-known evolutionary algorithms on the CEC'13 and CEC'17 test suites. Experimental results show that the proposed algorithm performs competitively against these compared algorithms on the test suites.

Original languageEnglish
Article number9359652
Pages (from-to)666-680
Number of pages15
JournalIEEE Transactions on Evolutionary Computation
Volume25
Issue number4
DOIs
StatePublished - Aug 2021

Keywords

  • Adaptive differential evolution
  • deep learning
  • global optimization
  • policy gradient (PG)
  • reinforcement learning (RL)

Fingerprint

Dive into the research topics of 'Learning Adaptive Differential Evolution Algorithm from Optimization Experiences by Policy Gradient'. Together they form a unique fingerprint.

Cite this