TY - GEN
T1 - Visual Manipulation Relationship Network for Autonomous Robotics
AU - Zhang, Hanbo
AU - Lan, Xuguang
AU - Zhou, Xinwen
AU - Tian, Zhiqiang
AU - Zhang, Yang
AU - Zheng, Nanning
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Robotic grasping is one of the most important fields in robotics, in which great progress has been made in recent years with the help of convolutional neural network (CNN). However, including multiple objects in one scene can invalidate the existing CNN-based grasp detection algorithms, because manipulation relationships among objects are not considered, which are required to guide the robot to grasp things in the right order. This paper presents a new CNN architecture called Visual Manipulation Relationship Network (VMRN) to help robots detect targets and predict the manipulation relationships in real time, which ensures that the robot can complete tasks in a safe and reliable way. To implement end-to-end training and meet real-time requirements in robot tasks, we propose the Object Pairing Pooling Layer (OP2L) to help to predict all manipulation relationships in one forward process. Moreover, in order to train VMRN, we collect a dataset named Visual Manipulation Relationship Dataset (VMRD) consisting of 5185 images with more than 17000 object instances and the manipulation relationships between all possible pairs of objects in every image, which is labeled by the manipulation relationship tree. The experimental results show that the new network architecture can detect objects and predict manipulation relationships simultaneously and meet the real-time requirements in robot tasks.
AB - Robotic grasping is one of the most important fields in robotics, in which great progress has been made in recent years with the help of convolutional neural network (CNN). However, including multiple objects in one scene can invalidate the existing CNN-based grasp detection algorithms, because manipulation relationships among objects are not considered, which are required to guide the robot to grasp things in the right order. This paper presents a new CNN architecture called Visual Manipulation Relationship Network (VMRN) to help robots detect targets and predict the manipulation relationships in real time, which ensures that the robot can complete tasks in a safe and reliable way. To implement end-to-end training and meet real-time requirements in robot tasks, we propose the Object Pairing Pooling Layer (OP2L) to help to predict all manipulation relationships in one forward process. Moreover, in order to train VMRN, we collect a dataset named Visual Manipulation Relationship Dataset (VMRD) consisting of 5185 images with more than 17000 object instances and the manipulation relationships between all possible pairs of objects in every image, which is labeled by the manipulation relationship tree. The experimental results show that the new network architecture can detect objects and predict manipulation relationships simultaneously and meet the real-time requirements in robot tasks.
UR - https://www.scopus.com/pages/publications/85062263136
U2 - 10.1109/HUMANOIDS.2018.8625071
DO - 10.1109/HUMANOIDS.2018.8625071
M3 - 会议稿件
AN - SCOPUS:85062263136
T3 - IEEE-RAS International Conference on Humanoid Robots
SP - 118
EP - 125
BT - 2018 IEEE-RAS 18th International Conference on Humanoid Robots, Humanoids 2018
PB - IEEE Computer Society
T2 - 18th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2018
Y2 - 6 November 2018 through 9 November 2018
ER -