跳到主要导航 跳到搜索 跳到主要内容

SoK: Dataset Copyright Auditing in Machine Learning Systems

  • Linkang Du
  • , Xuanru Zhou
  • , Min Chen
  • , Chusong Zhang
  • , Zhou Su
  • , Peng Cheng
  • , Jiming Chen
  • , Zhikun Zhang
  • Xi'an Jiaotong University
  • Zhejiang University
  • Vrije Universiteit Amsterdam
  • Hangzhou Dianzi University

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

As the implementation of machine learning (ML) systems becomes more widespread, especially with the introduction of larger ML models, we perceive a spring demand for massive data. However, it inevitably causes infringement and misuse problems with the data, such as using unauthorized online artworks or face images to train ML models. To address this problem, many efforts have been made to audit the copyright of the model training dataset. However, existing solutions vary in auditing assumptions and capabilities, making it difficult to compare their strengths and weaknesses. In addition, robustness evaluations usually consider only part of the ML pipeline and hardly reflect the performance of algorithms in real-world ML applications. Thus, it is essential to take a practical deployment perspective on the current dataset copyright auditing tools, examining their effectiveness and limitations. Concretely, we categorize dataset copyright auditing research into two prominent strands: intrusive methods and non-intrusive methods, depending on whether they require modifications to the original dataset. Then, we break down the intrusive methods into different watermark injection options and examine the non-intrusive methods using various finger-prints. To summarize our results, we offer detailed reference tables, highlight key points, and pinpoint unresolved issues in the current literature. By combining the pipeline in ML systems and analyzing previous studies, we highlight several future directions to make auditing tools more suitable for real-world copyright protection requirements.

源语言英语
主期刊名Proceedings - 46th IEEE Symposium on Security and Privacy, SP 2025
编辑Marina Blanton, William Enck, Cristina Nita-Rotaru
出版商Institute of Electrical and Electronics Engineers Inc.
2076-2094
页数19
ISBN(电子版)9798331522360
DOI
出版状态已出版 - 2025
活动46th IEEE Symposium on Security and Privacy, SP 2025 - San Francisco, 美国
期限: 12 5月 202515 5月 2025

出版系列

姓名Proceedings - IEEE Symposium on Security and Privacy
ISSN(印刷版)1081-6011

会议

会议46th IEEE Symposium on Security and Privacy, SP 2025
国家/地区美国
San Francisco
时期12/05/2515/05/25

学术指纹

探究 'SoK: Dataset Copyright Auditing in Machine Learning Systems' 的科研主题。它们共同构成独一无二的指纹。

引用此