Abstract
In robotic operations for long-horizon tasks, the sequences of offline skill-learning actions are diverse, the relationships between natural language instruction comprehension and long-horizon task semantics are complex, and the information density is high. To address these challenges, a long-horizon task planning algorithm based on multi-modal diffusion policy (named MMDPP) is proposed to improve the task completion rate and robustness in complex environments. The method uses a large visual language model to transform natural language tasks into structured task elements, introduces a multimodal fusion module to model the low-dimensional state, image observation and task semantics in a unified way, and uses selective channels to reduce the gradient conflict and the gradient cross-interference. A conditional diffusion generation model is constructed on this basis to directly output structurally consistent and task-aligned action sequences, realizing end-to-end strategy planning from language input to action prediction. In the MuJoCo-Kitchen-Image kitchen environment (self-constructed dataset), the MMDPP method significantly outperforms the baseline method in long-horizon task success rate; in the Robosuite-Kitchen environment, it surpasses SiMPL by 2.4%; and it achieves an 80% success rate on the UR5 physical robot platform in table-top rearrangement tasks, demonstrating good accuracy and realistic adaptability in the manipulation tasks. The adaptability of action policy learning to task changes in long-horizon tasks is significantly enhanced by the proposed method, providing an effective paradigm for long-horizon robot planning based on diffusion modeling.
| Translated title of the contribution | Long-horizon Task Planning Based on Multi-modal Diffusion Policy |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 548-558 |
| Number of pages | 11 |
| Journal | Jiqiren/Robot |
| Volume | 47 |
| Issue number | 4 |
| DOIs | |
| State | Published - Jul 2025 |
Fingerprint
Dive into the research topics of 'Long-horizon Task Planning Based on Multi-modal Diffusion Policy'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver