TY - GEN
T1 - UTIO
T2 - 28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022
AU - Zhao, Cui
AU - Li, Zhenjiang
AU - Ding, Han
AU - Xi, Wei
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The audio adversarial example has been demonstrated to be an effective attack which leads to prediction errors of the intelligent voice control system (e.g., deep neural network based speech recognition service), despite resembling a valid input to our human beings. An ideal adversarial example attack should have four major advantages, including 1) utilizing a universal adversarial perturbation against arbitrary voice commands, 2) tricking a model to get an incorrect and targeted result, 3) imperceptible to users even in a silent place and 4) validating in an over-the-air (OTA) scenario as well. However, existing studies mainly involve several but not all of these criteria. In this paper, we propose UTIO, a universal, targeted, imperceptible and OTA audio adversarial example design, which leverages one perturbation to fool a speech recognition model in OTA scenarios. Moreover, a variety of speeches can be misled to a targeted threat command imperceptibly. To harvest such benefits, we leverage two targeted loss functions to generate adversarial perturbations, and employ the psychoacoustic principle to further conceal the attack. Finally, we actively embed additional distortions, occurred during the physical propagation, in the process of perturbation generation to make UTIO still valid in an OTA scenario. Extensive experiments show that UTIO can perform 94.15% success attack rate locally, i.e., without physical propagation, while retaining 93.44% attack rate in an OTA scenario. In addition, three types of defensive strategies are also introduced to resist against our attack.
AB - The audio adversarial example has been demonstrated to be an effective attack which leads to prediction errors of the intelligent voice control system (e.g., deep neural network based speech recognition service), despite resembling a valid input to our human beings. An ideal adversarial example attack should have four major advantages, including 1) utilizing a universal adversarial perturbation against arbitrary voice commands, 2) tricking a model to get an incorrect and targeted result, 3) imperceptible to users even in a silent place and 4) validating in an over-the-air (OTA) scenario as well. However, existing studies mainly involve several but not all of these criteria. In this paper, we propose UTIO, a universal, targeted, imperceptible and OTA audio adversarial example design, which leverages one perturbation to fool a speech recognition model in OTA scenarios. Moreover, a variety of speeches can be misled to a targeted threat command imperceptibly. To harvest such benefits, we leverage two targeted loss functions to generate adversarial perturbations, and employ the psychoacoustic principle to further conceal the attack. Finally, we actively embed additional distortions, occurred during the physical propagation, in the process of perturbation generation to make UTIO still valid in an OTA scenario. Extensive experiments show that UTIO can perform 94.15% success attack rate locally, i.e., without physical propagation, while retaining 93.44% attack rate in an OTA scenario. In addition, three types of defensive strategies are also introduced to resist against our attack.
KW - Adversarial Example
KW - Machine Learning
KW - Speech Recognition
KW - Voice control systems
UR - https://www.scopus.com/pages/publications/85152925921
U2 - 10.1109/ICPADS56603.2022.00052
DO - 10.1109/ICPADS56603.2022.00052
M3 - 会议稿件
AN - SCOPUS:85152925921
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 346
EP - 353
BT - Proceedings - 2022 IEEE 28th International Conference on Parallel and Distributed Systems, ICPADS 2022
PB - IEEE Computer Society
Y2 - 10 January 2023 through 12 January 2023
ER -