Safe Refined Oil Dispatching via Constrained Multiagent Reinforcement Learning With Hierarchical Action Spaces

  • Kun Tang
  • , Chengwei Zhang
  • , Wanting Liu
  • , Xue Li
  • , Qi Wang
  • , Dou An
  • , Furui Zhan

Research output: Contribution to journalArticlepeer-review

Abstract

The rise in urbanization and the demand for refined oil pose a significant challenge in improving transportation efficiency while ensuring the safety of gas station inventories. Traditional optimization methods, which rely on deterministic models or demand forecasting, enhance efficiency but struggle with uncertainties such as demand fluctuations. Multi-agent reinforcement learning (MARL) presents a promising approach for adaptive cooperative dispatching. This study models the refined oil dispatch problem as a partially observable constrained Markov game (CPOMG), in which agents make cooperative decisions under inventory constraints with limited observation, and proposes Hierarchical Action-Constrained MAPPO (HAC-MAPPO), a novel cooperation MARL algorithm to find the optimal cooperative dispatching policy of the game. Within the CPOMG framework, the HAC-MAPPO agents optimize order generation decisions based on the states of gas station inventories. The subsequent delivery routing for these orders is optimized using an integrated fuel routing tabu search algorithm, which is part of the environment's state transition logic. HAC-MAPPO incorporates inventory safety thresholds to constrain order generation decisions across fuel types. It employs a dual-critic architecture and a shared hierarchical actor network for efficient decentralized policy learning. Crucially, the Lagrangian relaxation technique is applied to transform the constrained actor optimization objective (for order generation) into an unconstrained form. Evaluations carried out in real-world scenarios (based on actual geographic coordinates) and synthetic scenarios using multiple types of synthetic fuel consumption datasets demonstrate that the proposed model and algorithm significantly reduce inventory violations and improve dispatch efficiency and safety under varying demand patterns, outperforming baseline methods. Note to Practitioners - The efficient and stable dispatch of refined oil products is a critical challenge in the petroleum supply chain, directly impacting operational costs and service reliability. Traditional methods often rely on static demand forecasts, which struggle to adapt to dynamic market fluctuations, leading to inefficiencies such as delayed deliveries, stockouts, or overstocking. This paper presents a practical solution leveraging multi-agent reinforcement learning (MARL) to optimize dispatching decisions in real time, ensuring both efficiency and safety. Our approach, HAC-MAPPO, enables gas stations, depots, and transport vehicles to collaboratively adapt to changing demand patterns while adhering to inventory safety constraints. By integrating hierarchical action selection decision-making and adaptive routing algorithms, the system minimizes delivery delays and reduces inventory risks, such as shortages or excess stock. This method has been validated in both synthetic and real-world scenarios, demonstrating significant improvements in dispatch efficiency and operational stability. For industry practitioners, this work offers a scalable framework that can be integrated into existing logistics systems with minimal disruption. Potential applications include large-scale fuel distribution networks, where dynamic demand and complex routing are common challenges. Although the current implementation focuses on refined oil products, the methodology can be extended to other supply chain domains with similar requirements, such as chemical or liquid goods transportation. Future enhancements could involve incorporating additional real-world constraints, such as vehicle maintenance schedules or traffic conditions, to further improve practicality.

Original languageEnglish
Pages (from-to)23164-23176
Number of pages13
JournalIEEE Transactions on Automation Science and Engineering
Volume22
DOIs
StatePublished - 2025

Keywords

  • constrained partially observable Markov game
  • Multiagent reinforcement learning
  • order generation
  • refined oil dispatching

Fingerprint

Dive into the research topics of 'Safe Refined Oil Dispatching via Constrained Multiagent Reinforcement Learning With Hierarchical Action Spaces'. Together they form a unique fingerprint.

Cite this