Remote Sensing Image Captioning Using Transformer

  • Binze Wang
  • , Jiangbo Xi
  • , Xingrun Wang
  • , Jianwu Fang
  • , Wandong Jiang
  • , Dashuai Xie
  • , Yaobing Xiang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Image captioning generates a semantic description for the images, and with the development of deep learning, it usually combines computer vision and natural language processing. Image captioning needs not only recognize the important objects, attributes and the spatial relationships with the surrounding objects in the image, but also generate text descriptions that correspond to the language rules of people. In this paper, we proposed a image captioning model based on transformer. In the image understanding part, VGG16 was used to extract image information, and transformer encoder was used to extract relation from different image regions. The text generation extracts relations of word features in the description, and calculates the correlation between text and images from a variety of perspectives. The experimental results with indices BLEU4, METEOR, ROUGE, and CIDEr on the RSICD dataset are 0.29, 0.34, 0.61, and 2.53, respectively. These results are competitive and even better than the SOTA results. It is seen that show that transformer can alleviate overfitting on small datasets, accelerate the training process, and be generalized better.

Original languageEnglish
Title of host publicationProceedings of 2021 International Conference on Autonomous Unmanned Systems, ICAUS 2021
EditorsMeiping Wu, Yifeng Niu, Mancang Gu, Jin Cheng
PublisherSpringer Science and Business Media Deutschland GmbH
Pages3388-3397
Number of pages10
ISBN (Print)9789811694912
DOIs
StatePublished - 2022
Externally publishedYes
EventInternational Conference on Autonomous Unmanned Systems, ICAUS 2021 - Changsha, China
Duration: 24 Sep 202126 Sep 2021

Publication series

NameLecture Notes in Electrical Engineering
Volume861 LNEE
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

ConferenceInternational Conference on Autonomous Unmanned Systems, ICAUS 2021
Country/TerritoryChina
CityChangsha
Period24/09/2126/09/21

Keywords

  • Image captioning
  • Remote sensing image
  • Transformer

Fingerprint

Dive into the research topics of 'Remote Sensing Image Captioning Using Transformer'. Together they form a unique fingerprint.

Cite this