TY - JOUR
T1 - Towards Gradient-Based Saliency Consensus Training for Adversarial Robustness
AU - Li, Qian
AU - Shen, Chao
AU - Hu, Qingyuan
AU - Lin, Chenhao
AU - Ji, Xiang
AU - Qi, Saiyu
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - In recent works, robust networks have consistently exhibited more discriminative saliency map that proves to indicate sufficient adversarial robustness. In existed safe training paradigms e.g., adversarial training, however, the progressive saliency information regarding on what input semantic feature model prediction relies, have not yet been fully-explored. Due to this, we consider the incorporation of posterior saliency properties of robust model in training, as an efficient supervision signal on robust learning. It thus provides an alternative direction to enhance robustness, from the saliency interpretability perspective. In this article, to harden model we propose to optimize the discrimination of intermediate gradient-based saliency and maintain its consensus in training, which encourage model to behave according to task-relevant feature from the salient region such as object edges in image. Then, we introduce Adversarially Gradient-based Saliency Consensus Training method, dubbed Adv-GSCT. Within it, we preserve the similarity between the learned model saliency and the target one as label, approximated in the most offending case representing the least but essential information scenario. Meanwhile, a constructed pseudo-input coupled with feature importance, is feed into model to ensure the discrimination of estimated target saliency. Besides providing a novel insight into adversarial defense, Adv-GSCT differs from the current most effective adversarial training and does not need multiple iterative generations of adversarial perturbation whose computational cost and sensitivity direction of prediction concern. Finally, extensive performance evaluations on MNIST, CIFAR-10 and ImageNet datasets demonstrate the superiority of our proposed method.
AB - In recent works, robust networks have consistently exhibited more discriminative saliency map that proves to indicate sufficient adversarial robustness. In existed safe training paradigms e.g., adversarial training, however, the progressive saliency information regarding on what input semantic feature model prediction relies, have not yet been fully-explored. Due to this, we consider the incorporation of posterior saliency properties of robust model in training, as an efficient supervision signal on robust learning. It thus provides an alternative direction to enhance robustness, from the saliency interpretability perspective. In this article, to harden model we propose to optimize the discrimination of intermediate gradient-based saliency and maintain its consensus in training, which encourage model to behave according to task-relevant feature from the salient region such as object edges in image. Then, we introduce Adversarially Gradient-based Saliency Consensus Training method, dubbed Adv-GSCT. Within it, we preserve the similarity between the learned model saliency and the target one as label, approximated in the most offending case representing the least but essential information scenario. Meanwhile, a constructed pseudo-input coupled with feature importance, is feed into model to ensure the discrimination of estimated target saliency. Besides providing a novel insight into adversarial defense, Adv-GSCT differs from the current most effective adversarial training and does not need multiple iterative generations of adversarial perturbation whose computational cost and sensitivity direction of prediction concern. Finally, extensive performance evaluations on MNIST, CIFAR-10 and ImageNet datasets demonstrate the superiority of our proposed method.
KW - Adversarial robustness
KW - deep neural networks
KW - saliency consensus
UR - https://www.scopus.com/pages/publications/85133739076
U2 - 10.1109/TDSC.2022.3184594
DO - 10.1109/TDSC.2022.3184594
M3 - 文章
AN - SCOPUS:85133739076
SN - 1545-5971
VL - 21
SP - 530
EP - 541
JO - IEEE Transactions on Dependable and Secure Computing
JF - IEEE Transactions on Dependable and Secure Computing
IS - 2
ER -