Skip to main navigation Skip to search Skip to main content

GRRSIS: Generalized Referring Remote Sensing Image Segmentation

  • Xi'an Jiaotong University
  • Beihang University

Research output: Contribution to journalArticlepeer-review

Abstract

Referring remote sensing image segmentation (RRSIS) is a challenging task that involves segmenting target instances within a top-view image guided by a natural language expression. Existing classic RRSIS methods commonly support target expressions only, i.e., the target described by the expression is present in the image. No-target expressions are excluded. Under this constraint, the model may face significant challenges. For instance, a small error, such as a typographical mistake, could cause a complete failure of the model. To overcome this issue, in this article, we introduce a new benchmark called generalized RRSIS (GRRSIS), which extends classic RRSIS by allowing expressions to refer to no-target objects. Toward this, we construct the first large-scale dataset for GRRSIS, called GRRSIS-D, which includes multitarget, single-target, and no-target expressions. Core challenges in GRRSIS stem from the fact that objects in aerial images often occupy only a small number of pixels, exhibit significant orientation variations, and present varying levels of recognition difficulty. To tackle these challenges, we propose an oriented-aware multiscale network with an adaptive angle sensing module that integrates adaptive rotated convolution and a gating mechanism to capture diverse object orientations while suppressing irrelevant features for more accurate representations. In addition, we introduce a novel online hard case mining loss, which allocates varying levels of attention to foreground and background regions and reshapes the standard loss by downweighting well-segmented examples, effectively addressing the issues caused by low pixel occupancy and uneven sample difficulty. The proposed approach achieves state-of-the-art performance on both the newly introduced GRRSIS and classic RRSIS tasks.

Original languageEnglish
Article number5656017
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
StatePublished - 2025

Keywords

  • Adaptive multimodal feature fusion (AMFF)
  • generalized referring remote sensing segmentation
  • generalized referring remote sensing segmentation dataset
  • online hard case mining loss

Fingerprint

Dive into the research topics of 'GRRSIS: Generalized Referring Remote Sensing Image Segmentation'. Together they form a unique fingerprint.

Cite this