Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning

Research output: Contribution to journalArticlepeer-review

162 Scopus citations

Abstract

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multiagent reinforcement learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this article, a new MARL, called cooperative double Q-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q-learning method based on double estimators and the upper confidence bound (UCB) policy, which can eliminate the over-estimation problem existing in traditional independent Q-learning while ensuring exploration. It uses mean-field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied to TSC and tested on various traffic flow scenarios of TSC simulators. The results show that Co-DQL outperforms the state-of-the-art decentralized MARL algorithms in terms of multiple traffic metrics.

Original languageEnglish
Article number9186324
Pages (from-to)174-187
Number of pages14
JournalIEEE Transactions on Cybernetics
Volume51
Issue number1
DOIs
StatePublished - Jan 2021
Externally publishedYes

Keywords

  • Double estimators
  • mean-field approximation
  • multiagent reinforcement learning (MARL)
  • traffic signal control (TSC)

Fingerprint

Dive into the research topics of 'Large-Scale Traffic Signal Control Using a Novel Multiagent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this