More Simplicity for Trainers, More Opportunity for Attackers: Black-Box Attacks on Speaker Recognition Systems by Inferring Feature Extractor

  • Yunjie Ge
  • , Pinji Chen
  • , Qian Wang
  • , Lingchen Zhao
  • , Ningping Mou
  • , Peipei Jiang
  • , Cong Wang
  • , Qi Li
  • , Chao Shen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Recent studies have revealed that deep learning-based speaker recognition systems (SRSs) are vulnerable to adversarial examples (AEs). However, the practicality of existing black-box AE attacks is restricted by the requirement for extensive querying of the target system or the limited attack success rates (ASR). In this paper, we introduce VoxCloak, a new targeted AE attack with superior performance in both these aspects. Distinct from existing methods that optimize AEs by querying the target model, VoxCloak initially employs a small number of queries (e.g., a few hundred) to infer the feature extractor used by the target system. It then utilizes this feature extractor to generate any number of AEs locally without the need for further queries. We evaluate VoxCloak on four commercial speaker recognition (SR) APIs and seven voice assistants. On the SR APIs, VoxCloak surpasses the existing transfer-based attacks, improving ASR by 76.25% and signal-to-noise ratio (SNR) by 13.46 dB, as well as the decision-based attacks, requiring 33 times fewer queries and improving SNR by 7.87 dB while achieving comparable ASRs. On the voice assistants, VoxCloak outperforms the existing methods with a 49.40% improvement in ASR and a 15.79 dB improvement in SNR.

Original languageEnglish
Title of host publicationProceedings of the 33rd USENIX Security Symposium
PublisherUSENIX Association
Pages2973-2990
Number of pages18
ISBN (Electronic)9781939133441
StatePublished - 2024
Event33rd USENIX Security Symposium, USENIX Security 2024 - Philadelphia, United States
Duration: 14 Aug 202416 Aug 2024

Publication series

NameProceedings of the 33rd USENIX Security Symposium

Conference

Conference33rd USENIX Security Symposium, USENIX Security 2024
Country/TerritoryUnited States
CityPhiladelphia
Period14/08/2416/08/24

Fingerprint

Dive into the research topics of 'More Simplicity for Trainers, More Opportunity for Attackers: Black-Box Attacks on Speaker Recognition Systems by Inferring Feature Extractor'. Together they form a unique fingerprint.

Cite this