TY - GEN
T1 - VrR-VG
T2 - 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
AU - Liang, Yuanzhi
AU - Bai, Yalong
AU - Zhang, Wei
AU - Qian, Xueming
AU - Zhu, Li
AU - Mei, Tao
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Relationships encode the interactions among individual instances and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, relationship models tend to fit the statistical bias rather than ''learning' to infer the relationships from images. To encourage further development in visual relationships, we propose a novel method to mine more valuable relationships by automatically pruning visually-irrelevant relationships. We construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. Compared with existing datasets, the performance gap between learnable and statistical method is more significant in VrR-VG, and frequency-based analysis does not work anymore. Moreover, we propose to learn a relationship-aware representation by jointly considering instances, attributes and relationships. By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved, which demonstrates the effectiveness of both our dataset and features embedding schema. Both our VrR-VG dataset and representation-aware features will be made publicly available soon.
AB - Relationships encode the interactions among individual instances and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, relationship models tend to fit the statistical bias rather than ''learning' to infer the relationships from images. To encourage further development in visual relationships, we propose a novel method to mine more valuable relationships by automatically pruning visually-irrelevant relationships. We construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. Compared with existing datasets, the performance gap between learnable and statistical method is more significant in VrR-VG, and frequency-based analysis does not work anymore. Moreover, we propose to learn a relationship-aware representation by jointly considering instances, attributes and relationships. By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved, which demonstrates the effectiveness of both our dataset and features embedding schema. Both our VrR-VG dataset and representation-aware features will be made publicly available soon.
UR - https://www.scopus.com/pages/publications/85081938725
U2 - 10.1109/ICCV.2019.01050
DO - 10.1109/ICCV.2019.01050
M3 - 会议稿件
AN - SCOPUS:85081938725
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 10402
EP - 10411
BT - Proceedings - 2019 International Conference on Computer Vision, ICCV 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 October 2019 through 2 November 2019
ER -