Connecting language to images: A progressive attention-guided network for simultaneous image captioning and language grounding

  • Lingyun Song
  • , Jun Liu
  • , Buyue Qian
  • , Yihe Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Image captioning and visual language grounding are two important tasks for image understanding, but are seldom considered together. In this paper, we propose a Progressive Attention-Guided Network (PAGNet), which simultaneously generates image captions and predicts bounding boxes for caption words. PAGNet mainly has two distinctive properties: i) It can progressively refine the predictive results of image captioning, by updating the attention map with the predicted bounding boxes. ii) It learns bounding boxes of the words using a weakly supervised strategy, which combines the frameworks of Multiple Instance Learning (MIL) and Markov Decision Process (MDP). By using the attention map generated in the captioning process, PAGNet significantly reduces the search space of the MDP. We conduct experiments on benchmark datasets to demonstrate the effectiveness of PAGNet and results show that PAGNet achieves the best performance.

Original languageEnglish
Title of host publication33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
PublisherAAAI press
Pages8885-8892
Number of pages8
ISBN (Electronic)9781577358091
DOIs
StatePublished - 2019
Event33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 - Honolulu, United States
Duration: 27 Jan 20191 Feb 2019

Publication series

Name33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019

Conference

Conference33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Annual Conference on Innovative Applications of Artificial Intelligence, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
Country/TerritoryUnited States
CityHonolulu
Period27/01/191/02/19

Fingerprint

Dive into the research topics of 'Connecting language to images: A progressive attention-guided network for simultaneous image captioning and language grounding'. Together they form a unique fingerprint.

Cite this