TY - JOUR
T1 - Uncertainty-guided hierarchical frequency domain Transformer for image restoration
AU - Shao, Mingwen
AU - Qiao, Yuanjian
AU - Meng, Deyu
AU - Zuo, Wangmeng
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/3/5
Y1 - 2023/3/5
N2 - Existing convolutional neural network (CNN)-based and vision Transformer (ViT)-based image restoration methods are usually explored in the spatial domain. However, we employ Fourier analysis to show that these spatial domain models cannot perceive the entire frequency spectrum of images, i.e., mainly focus on either high-frequency (CNN-based models) or low-frequency components (ViT-based models). This intrinsic limitation results in the partial missing of semantic information and the appearance of artifacts. To address this limitation, we propose a novel uncertainty-guided hierarchical frequency domain Transformer named HFDT to effectively learn both high and low-frequency information while perceiving local and global features. Specifically, to aggregate semantic information from various frequency levels, we propose a dual-domain feature interaction mechanism, in which the global frequency information and local spatial features are extracted by corresponding branches. The frequency domain branch adopts the Fast Fourier Transform (FFT) to convert the features from the spatial domain to the frequency domain, where the global low and high-frequency components are learned with Log-linear complexity. Complementarily, an efficient convolution group is employed in the spatial domain branch to capture local high-frequency details. Moreover, we introduce an uncertainty degradation-guided strategy to efficiently represent degraded prior information, rather than simply distinguishing degraded/non-degraded regions in binary form. Our approach achieves competitive results in several degraded scenarios, including rain streaks, raindrops, motion blur, and defocus blur.
AB - Existing convolutional neural network (CNN)-based and vision Transformer (ViT)-based image restoration methods are usually explored in the spatial domain. However, we employ Fourier analysis to show that these spatial domain models cannot perceive the entire frequency spectrum of images, i.e., mainly focus on either high-frequency (CNN-based models) or low-frequency components (ViT-based models). This intrinsic limitation results in the partial missing of semantic information and the appearance of artifacts. To address this limitation, we propose a novel uncertainty-guided hierarchical frequency domain Transformer named HFDT to effectively learn both high and low-frequency information while perceiving local and global features. Specifically, to aggregate semantic information from various frequency levels, we propose a dual-domain feature interaction mechanism, in which the global frequency information and local spatial features are extracted by corresponding branches. The frequency domain branch adopts the Fast Fourier Transform (FFT) to convert the features from the spatial domain to the frequency domain, where the global low and high-frequency components are learned with Log-linear complexity. Complementarily, an efficient convolution group is employed in the spatial domain branch to capture local high-frequency details. Moreover, we introduce an uncertainty degradation-guided strategy to efficiently represent degraded prior information, rather than simply distinguishing degraded/non-degraded regions in binary form. Our approach achieves competitive results in several degraded scenarios, including rain streaks, raindrops, motion blur, and defocus blur.
KW - Frequency-domain Transformer
KW - Image restoration
KW - Log-linear complexity
KW - Uncertainty-guided
UR - https://www.scopus.com/pages/publications/85146434685
U2 - 10.1016/j.knosys.2023.110306
DO - 10.1016/j.knosys.2023.110306
M3 - 文章
AN - SCOPUS:85146434685
SN - 0950-7051
VL - 263
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110306
ER -