跳到主要导航 跳到搜索 跳到主要内容

SEMSTAMP: A Semantic Watermark with Paraphrastic Robustness for Text Generation

  • Abe Bohan Hou
  • , Jingyu Zhang
  • , Tianxing He
  • , Yichen Wang
  • , Yung Sung Chuang
  • , Hongwei Wang
  • , Lingfeng Shen
  • , Benjamin Van Durme
  • , Daniel Khashabi
  • , Yulia Tsvetkov
  • Johns Hopkins University
  • University of Washington
  • Massachusetts Institute of Technology
  • Tencent

科研成果: 书/报告/会议事项章节会议稿件同行评审

39 引用 (Scopus)

摘要

Existing watermarked generation algorithms employ token-level designs and therefore, are vulnerable to paraphrase attacks. To address this issue, we introduce watermarking on the semantic representation of sentences. We propose SEMSTAMP, a robust sentence-level semantic watermarking algorithm that uses locality-sensitive hashing (LSH) to partition the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by a language model, and conducts rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. To test the paraphrastic robustness of watermarking algorithms, we propose a “bigram paraphrase” attack that produces paraphrases with small bigram overlap with the original sentence. This attack is shown to be effective against existing token-level watermark algorithms, while posing only minor degradations to SEMSTAMP. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on various paraphrasers and domains, but also better at preserving the quality of generation.

源语言英语
主期刊名Long Papers
编辑Kevin Duh, Helena Gomez, Steven Bethard
出版商Association for Computational Linguistics (ACL)
4067-4082
页数16
ISBN(电子版)9798891761148
DOI
出版状态已出版 - 2024
活动2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, 墨西哥
期限: 16 6月 202421 6月 2024

出版系列

姓名Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
1

会议

会议2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
国家/地区墨西哥
Hybrid, Mexico City
时期16/06/2421/06/24

学术指纹

探究 'SEMSTAMP: A Semantic Watermark with Paraphrastic Robustness for Text Generation' 的科研主题。它们共同构成独一无二的指纹。

引用此