TY - JOUR
T1 - Exploiting Multi-Scale Parallel Self-Attention and Local Variation via Dual-Branch Transformer-CNN Structure for Face Super-Resolution
AU - Shi, Jingang
AU - Wang, Yusi
AU - Yu, Zitong
AU - Li, Guanxin
AU - Hong, Xiaopeng
AU - Wang, Fei
AU - Gong, Yihong
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Recently, deep learning technique has been widely employed to deal with face super-resolution (FSR) problem. It aims to predict the nonlinear relationship between the low-resolution (LR) face images and corresponding high-resolution (HR) ones, which could recover the high-frequency details from the LR degraded textures. However, either CNN-based or Transformer-based approaches mostly enhance the details by exploiting the relationship of local pixels or patches on LR features, the nonlocal features are not fully taken into account for producing high-frequency textures. To improve the above problem, we design a novel dual-branch module which consists of Transformer and CNN respectively. The Transformer branch extracts multiple scale feature embeddings and explores local and nonlocal self-attention simultaneously. Thus, the parallel self-attention mechanism has superior capabilities to capture the local and nonlocal dependencies on face image in the face reconstruction. Furthermore, the traditional CNNs usually extract features by combining pixels in a local convolutional kernel, it may be not effective to recover lost high-frequency details since the variations of local pixels are not well measured, which is important in recovering vivid edges and contours. To this end, we propose the local variation based attention block on the CNN branch, which could enhance the capabilities by directly extracting features from the variation of neighboring pixels. Finally, the Transformer-branch and CNN-branch are combined together by the modulation block to fuse both nonlocal and local advantages from two branches. Experimental results demonstrate the effectiveness of the proposed method when compared with state-of-the-art approaches.
AB - Recently, deep learning technique has been widely employed to deal with face super-resolution (FSR) problem. It aims to predict the nonlinear relationship between the low-resolution (LR) face images and corresponding high-resolution (HR) ones, which could recover the high-frequency details from the LR degraded textures. However, either CNN-based or Transformer-based approaches mostly enhance the details by exploiting the relationship of local pixels or patches on LR features, the nonlocal features are not fully taken into account for producing high-frequency textures. To improve the above problem, we design a novel dual-branch module which consists of Transformer and CNN respectively. The Transformer branch extracts multiple scale feature embeddings and explores local and nonlocal self-attention simultaneously. Thus, the parallel self-attention mechanism has superior capabilities to capture the local and nonlocal dependencies on face image in the face reconstruction. Furthermore, the traditional CNNs usually extract features by combining pixels in a local convolutional kernel, it may be not effective to recover lost high-frequency details since the variations of local pixels are not well measured, which is important in recovering vivid edges and contours. To this end, we propose the local variation based attention block on the CNN branch, which could enhance the capabilities by directly extracting features from the variation of neighboring pixels. Finally, the Transformer-branch and CNN-branch are combined together by the modulation block to fuse both nonlocal and local advantages from two branches. Experimental results demonstrate the effectiveness of the proposed method when compared with state-of-the-art approaches.
KW - Face hallucination
KW - Transformer
KW - neural networks
KW - super-resolution
UR - https://www.scopus.com/pages/publications/85166746272
U2 - 10.1109/TMM.2023.3301225
DO - 10.1109/TMM.2023.3301225
M3 - 文章
AN - SCOPUS:85166746272
SN - 1520-9210
VL - 26
SP - 2608
EP - 2620
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -