Abstract
Visual relationship detection is a task aiming at mining the information of interactions between the paired objects in the image, describing the image in the form of (subject − predicate − object). Most of the previous works regard it as a pure classification problem by taking the integrated triplets as the label of the image; however, the numerous combinations of objects and the diversity of predicates are the tough challenges for these studies. Hence, we propose a deep model based on a modified bidirectional recurrent neural network (BRNN) to classify object and predict predicate simultaneously. By using the BRNN, the hidden information of the relationship in the image is extracted and a feature-infusion method is proposed. Additionally, we improve the existing works by introducing a paired non-maximum suppression method. The experiments show that our approach is competitive with the state-of-the-art works.
| Original language | English |
|---|---|
| Pages (from-to) | 35297-35313 |
| Number of pages | 17 |
| Journal | Multimedia Tools and Applications |
| Volume | 79 |
| Issue number | 47-48 |
| DOIs | |
| State | Published - Dec 2020 |
| Externally published | Yes |
Keywords
- Detection
- NMS
- RNN
- Visual relationship