TY - GEN
T1 - Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation
AU - Wang, Lei
AU - Yuan, Zejian
AU - Chen, Badong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Current end-to-end Scene Graph Generation (SGG) relies solely on visual representations to separately detect sparse relations and entities in an image. This leads to the issue where the predictions of entities do not contribute to the prediction of relations, necessitating post-processing to assign corresponding subjects and objects to the predicted relations. In this paper, we introduce a sparse relationship matrix that bridges entity detection and relation detection. Our approach not only eliminates the need for relation matching, but also leverages the semantics and positional information of predicted entities to enhance relation prediction. Specifically, a multi-granularity sparse relationship matrix prediction network is proposed, which utilizes three gated pooling modules focusing on filtering negative samples at different granularities, thereby obtaining a sparse relationship matrix containing entity pairs most likely to form relations. Finally, a set of sparse, most probable subject-object pairs can be constructed and used for relation decoding. Experimental results on multiple datasets demonstrate that our method achieves a new state-of-the-art overall performance. Our code is available at https://github.com/wanglei0618/Mg-RMPN.
AB - Current end-to-end Scene Graph Generation (SGG) relies solely on visual representations to separately detect sparse relations and entities in an image. This leads to the issue where the predictions of entities do not contribute to the prediction of relations, necessitating post-processing to assign corresponding subjects and objects to the predicted relations. In this paper, we introduce a sparse relationship matrix that bridges entity detection and relation detection. Our approach not only eliminates the need for relation matching, but also leverages the semantics and positional information of predicted entities to enhance relation prediction. Specifically, a multi-granularity sparse relationship matrix prediction network is proposed, which utilizes three gated pooling modules focusing on filtering negative samples at different granularities, thereby obtaining a sparse relationship matrix containing entity pairs most likely to form relations. Finally, a set of sparse, most probable subject-object pairs can be constructed and used for relation decoding. Experimental results on multiple datasets demonstrate that our method achieves a new state-of-the-art overall performance. Our code is available at https://github.com/wanglei0618/Mg-RMPN.
KW - End-to-End
KW - Multi-Granularity
KW - Scene Graph Generation
KW - Sparse Relationship Matrix
UR - https://www.scopus.com/pages/publications/85206390824
U2 - 10.1007/978-3-031-73007-8_7
DO - 10.1007/978-3-031-73007-8_7
M3 - 会议稿件
AN - SCOPUS:85206390824
SN - 9783031730061
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 105
EP - 121
BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
A2 - Leonardis, Aleš
A2 - Ricci, Elisa
A2 - Roth, Stefan
A2 - Russakovsky, Olga
A2 - Sattler, Torsten
A2 - Varol, Gül
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th European Conference on Computer Vision, ECCV 2024
Y2 - 29 September 2024 through 4 October 2024
ER -