TY - JOUR
T1 - Safe Refined Oil Dispatching via Constrained Multiagent Reinforcement Learning With Hierarchical Action Spaces
AU - Tang, Kun
AU - Zhang, Chengwei
AU - Liu, Wanting
AU - Li, Xue
AU - Wang, Qi
AU - An, Dou
AU - Zhan, Furui
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - The rise in urbanization and the demand for refined oil pose a significant challenge in improving transportation efficiency while ensuring the safety of gas station inventories. Traditional optimization methods, which rely on deterministic models or demand forecasting, enhance efficiency but struggle with uncertainties such as demand fluctuations. Multi-agent reinforcement learning (MARL) presents a promising approach for adaptive cooperative dispatching. This study models the refined oil dispatch problem as a partially observable constrained Markov game (CPOMG), in which agents make cooperative decisions under inventory constraints with limited observation, and proposes Hierarchical Action-Constrained MAPPO (HAC-MAPPO), a novel cooperation MARL algorithm to find the optimal cooperative dispatching policy of the game. Within the CPOMG framework, the HAC-MAPPO agents optimize order generation decisions based on the states of gas station inventories. The subsequent delivery routing for these orders is optimized using an integrated fuel routing tabu search algorithm, which is part of the environment's state transition logic. HAC-MAPPO incorporates inventory safety thresholds to constrain order generation decisions across fuel types. It employs a dual-critic architecture and a shared hierarchical actor network for efficient decentralized policy learning. Crucially, the Lagrangian relaxation technique is applied to transform the constrained actor optimization objective (for order generation) into an unconstrained form. Evaluations carried out in real-world scenarios (based on actual geographic coordinates) and synthetic scenarios using multiple types of synthetic fuel consumption datasets demonstrate that the proposed model and algorithm significantly reduce inventory violations and improve dispatch efficiency and safety under varying demand patterns, outperforming baseline methods. Note to Practitioners - The efficient and stable dispatch of refined oil products is a critical challenge in the petroleum supply chain, directly impacting operational costs and service reliability. Traditional methods often rely on static demand forecasts, which struggle to adapt to dynamic market fluctuations, leading to inefficiencies such as delayed deliveries, stockouts, or overstocking. This paper presents a practical solution leveraging multi-agent reinforcement learning (MARL) to optimize dispatching decisions in real time, ensuring both efficiency and safety. Our approach, HAC-MAPPO, enables gas stations, depots, and transport vehicles to collaboratively adapt to changing demand patterns while adhering to inventory safety constraints. By integrating hierarchical action selection decision-making and adaptive routing algorithms, the system minimizes delivery delays and reduces inventory risks, such as shortages or excess stock. This method has been validated in both synthetic and real-world scenarios, demonstrating significant improvements in dispatch efficiency and operational stability. For industry practitioners, this work offers a scalable framework that can be integrated into existing logistics systems with minimal disruption. Potential applications include large-scale fuel distribution networks, where dynamic demand and complex routing are common challenges. Although the current implementation focuses on refined oil products, the methodology can be extended to other supply chain domains with similar requirements, such as chemical or liquid goods transportation. Future enhancements could involve incorporating additional real-world constraints, such as vehicle maintenance schedules or traffic conditions, to further improve practicality.
AB - The rise in urbanization and the demand for refined oil pose a significant challenge in improving transportation efficiency while ensuring the safety of gas station inventories. Traditional optimization methods, which rely on deterministic models or demand forecasting, enhance efficiency but struggle with uncertainties such as demand fluctuations. Multi-agent reinforcement learning (MARL) presents a promising approach for adaptive cooperative dispatching. This study models the refined oil dispatch problem as a partially observable constrained Markov game (CPOMG), in which agents make cooperative decisions under inventory constraints with limited observation, and proposes Hierarchical Action-Constrained MAPPO (HAC-MAPPO), a novel cooperation MARL algorithm to find the optimal cooperative dispatching policy of the game. Within the CPOMG framework, the HAC-MAPPO agents optimize order generation decisions based on the states of gas station inventories. The subsequent delivery routing for these orders is optimized using an integrated fuel routing tabu search algorithm, which is part of the environment's state transition logic. HAC-MAPPO incorporates inventory safety thresholds to constrain order generation decisions across fuel types. It employs a dual-critic architecture and a shared hierarchical actor network for efficient decentralized policy learning. Crucially, the Lagrangian relaxation technique is applied to transform the constrained actor optimization objective (for order generation) into an unconstrained form. Evaluations carried out in real-world scenarios (based on actual geographic coordinates) and synthetic scenarios using multiple types of synthetic fuel consumption datasets demonstrate that the proposed model and algorithm significantly reduce inventory violations and improve dispatch efficiency and safety under varying demand patterns, outperforming baseline methods. Note to Practitioners - The efficient and stable dispatch of refined oil products is a critical challenge in the petroleum supply chain, directly impacting operational costs and service reliability. Traditional methods often rely on static demand forecasts, which struggle to adapt to dynamic market fluctuations, leading to inefficiencies such as delayed deliveries, stockouts, or overstocking. This paper presents a practical solution leveraging multi-agent reinforcement learning (MARL) to optimize dispatching decisions in real time, ensuring both efficiency and safety. Our approach, HAC-MAPPO, enables gas stations, depots, and transport vehicles to collaboratively adapt to changing demand patterns while adhering to inventory safety constraints. By integrating hierarchical action selection decision-making and adaptive routing algorithms, the system minimizes delivery delays and reduces inventory risks, such as shortages or excess stock. This method has been validated in both synthetic and real-world scenarios, demonstrating significant improvements in dispatch efficiency and operational stability. For industry practitioners, this work offers a scalable framework that can be integrated into existing logistics systems with minimal disruption. Potential applications include large-scale fuel distribution networks, where dynamic demand and complex routing are common challenges. Although the current implementation focuses on refined oil products, the methodology can be extended to other supply chain domains with similar requirements, such as chemical or liquid goods transportation. Future enhancements could involve incorporating additional real-world constraints, such as vehicle maintenance schedules or traffic conditions, to further improve practicality.
KW - constrained partially observable Markov game
KW - Multiagent reinforcement learning
KW - order generation
KW - refined oil dispatching
UR - https://www.scopus.com/pages/publications/105020070020
U2 - 10.1109/TASE.2025.3625392
DO - 10.1109/TASE.2025.3625392
M3 - 文章
AN - SCOPUS:105020070020
SN - 1545-5955
VL - 22
SP - 23164
EP - 23176
JO - IEEE Transactions on Automation Science and Engineering
JF - IEEE Transactions on Automation Science and Engineering
ER -