TY - JOUR
T1 - Improving query efficiency of black-box attacks via the preference of deep learning models
AU - Yang, Xiangyuan
AU - Lin, Jie
AU - Zhang, Hanlin
AU - Zhao, Peng
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/9
Y1 - 2024/9
N2 - Black-box query attacks are effective at compromising deep-learning models using only the model's output. These attacks typically face challenges with low attack success rates (ASRs) when limited to fewer than ten queries per example. Recent approaches have improved ASRs due to the transferability of initial perturbations, yet they still suffer from inefficient querying. Our study introduces the Gradient-Aligned Attack (GAA) to enhance ASRs with minimal perturbation by focusing on the model's preference. We define a preference property where the generated adversarial example prefers to be misclassified as the wrong category with a high initial confidence. This property is further elucidated by the gradient preference, suggesting a positive correlation between the magnitude of a coefficient in a partial derivative and the norm of the derivative itself. Utilizing this, we devise the gradient-aligned CE (GACE) loss to precisely estimate gradients by aligning these coefficients between the surrogate and victim models, with coefficients assessed by the victim model's outputs. GAA, based on the GACE loss, also aims to achieve the smallest perturbation. Our tests on ImageNet, CIFAR10, and Imagga API show that GAA can increase ASRs by 25.7% and 40.3% for untargeted and targeted attacks respectively, while only needing minimally disruptive perturbations. Furthermore, the GACE loss reduces the number of necessary queries by up to 2.5x and enhances the transferability of advanced attacks by up to 14.2%, especially when using an ensemble surrogate model. Code is available at https://github.com/HaloMoto/GradientAlignedAttack.
AB - Black-box query attacks are effective at compromising deep-learning models using only the model's output. These attacks typically face challenges with low attack success rates (ASRs) when limited to fewer than ten queries per example. Recent approaches have improved ASRs due to the transferability of initial perturbations, yet they still suffer from inefficient querying. Our study introduces the Gradient-Aligned Attack (GAA) to enhance ASRs with minimal perturbation by focusing on the model's preference. We define a preference property where the generated adversarial example prefers to be misclassified as the wrong category with a high initial confidence. This property is further elucidated by the gradient preference, suggesting a positive correlation between the magnitude of a coefficient in a partial derivative and the norm of the derivative itself. Utilizing this, we devise the gradient-aligned CE (GACE) loss to precisely estimate gradients by aligning these coefficients between the surrogate and victim models, with coefficients assessed by the victim model's outputs. GAA, based on the GACE loss, also aims to achieve the smallest perturbation. Our tests on ImageNet, CIFAR10, and Imagga API show that GAA can increase ASRs by 25.7% and 40.3% for untargeted and targeted attacks respectively, while only needing minimally disruptive perturbations. Furthermore, the GACE loss reduces the number of necessary queries by up to 2.5x and enhances the transferability of advanced attacks by up to 14.2%, especially when using an ensemble surrogate model. Code is available at https://github.com/HaloMoto/GradientAlignedAttack.
KW - Black-box query attack
KW - Gradient preference
KW - Gradient-aligned attack
KW - Gradient-aligned cross-entropy loss
KW - Preference property
UR - https://www.scopus.com/pages/publications/85196012469
U2 - 10.1016/j.ins.2024.121013
DO - 10.1016/j.ins.2024.121013
M3 - 文章
AN - SCOPUS:85196012469
SN - 0020-0255
VL - 678
JO - Information Sciences
JF - Information Sciences
M1 - 121013
ER -