Abstract
Referring remote sensing image segmentation (RRSIS) is a challenging task that involves segmenting target instances within a top-view image guided by a natural language expression. Existing classic RRSIS methods commonly support target expressions only, i.e., the target described by the expression is present in the image. No-target expressions are excluded. Under this constraint, the model may face significant challenges. For instance, a small error, such as a typographical mistake, could cause a complete failure of the model. To overcome this issue, in this article, we introduce a new benchmark called generalized RRSIS (GRRSIS), which extends classic RRSIS by allowing expressions to refer to no-target objects. Toward this, we construct the first large-scale dataset for GRRSIS, called GRRSIS-D, which includes multitarget, single-target, and no-target expressions. Core challenges in GRRSIS stem from the fact that objects in aerial images often occupy only a small number of pixels, exhibit significant orientation variations, and present varying levels of recognition difficulty. To tackle these challenges, we propose an oriented-aware multiscale network with an adaptive angle sensing module that integrates adaptive rotated convolution and a gating mechanism to capture diverse object orientations while suppressing irrelevant features for more accurate representations. In addition, we introduce a novel online hard case mining loss, which allocates varying levels of attention to foreground and background regions and reshapes the standard loss by downweighting well-segmented examples, effectively addressing the issues caused by low pixel occupancy and uneven sample difficulty. The proposed approach achieves state-of-the-art performance on both the newly introduced GRRSIS and classic RRSIS tasks.
| Original language | English |
|---|---|
| Article number | 5656017 |
| Journal | IEEE Transactions on Geoscience and Remote Sensing |
| Volume | 63 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Adaptive multimodal feature fusion (AMFF)
- generalized referring remote sensing segmentation
- generalized referring remote sensing segmentation dataset
- online hard case mining loss
Fingerprint
Dive into the research topics of 'GRRSIS: Generalized Referring Remote Sensing Image Segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver