TY - JOUR
T1 - Dual Graph Attention Networks for Multi-View Visual Manipulation Relationship Detection and Robotic Grasping
AU - Ding, Mengyuan
AU - Liu, Yaxin
AU - Shi, Yaorui
AU - Lan, Xuguang
AU - Zheng, Nanning
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Visual manipulation relationship detection facilitates robots to achieve safe, orderly, and efficient grasping tasks. However, most existing algorithms only model object-level or relational-level dependency individually, lacking sufficient global information, which is difficult to handle different types of reasoning errors, especially in complex environments with multi-object stacking and occlusion. To solve the above problems, we propose Dual Graph Attention Networks (Dual-GAT) for visual manipulation relationship detection, with an object-level graph network for capturing object-level dependencies and a relational-level graph network for capturing relational triplets-level interactions. The attention mechanism assigns different weights to different dependencies, obtains more accurate global context information for reasoning, and gets a manipulation relationship graph. In addition, we use multi-view feature fusion to improve the occluded object features, then enhance the relationship detection performance in multi-object scenes. Finally, our method is deployed on the robot to construct a multi-object grasping system, which can be well applied to stacking environments. Experimental results on the datasets VMRD and REGRAD show that our method significantly outperforms others.
AB - Visual manipulation relationship detection facilitates robots to achieve safe, orderly, and efficient grasping tasks. However, most existing algorithms only model object-level or relational-level dependency individually, lacking sufficient global information, which is difficult to handle different types of reasoning errors, especially in complex environments with multi-object stacking and occlusion. To solve the above problems, we propose Dual Graph Attention Networks (Dual-GAT) for visual manipulation relationship detection, with an object-level graph network for capturing object-level dependencies and a relational-level graph network for capturing relational triplets-level interactions. The attention mechanism assigns different weights to different dependencies, obtains more accurate global context information for reasoning, and gets a manipulation relationship graph. In addition, we use multi-view feature fusion to improve the occluded object features, then enhance the relationship detection performance in multi-object scenes. Finally, our method is deployed on the robot to construct a multi-object grasping system, which can be well applied to stacking environments. Experimental results on the datasets VMRD and REGRAD show that our method significantly outperforms others.
KW - Dual graph attention networks
KW - multi-view feature fusion
KW - robotic grasping
KW - visual manipulation relationship detection
UR - https://www.scopus.com/pages/publications/105003088937
U2 - 10.1109/TASE.2025.3555206
DO - 10.1109/TASE.2025.3555206
M3 - 文章
AN - SCOPUS:105003088937
SN - 1545-5955
VL - 22
SP - 13694
EP - 13705
JO - IEEE Transactions on Automation Science and Engineering
JF - IEEE Transactions on Automation Science and Engineering
ER -