LVLM-FDA: Protecting Large Vision-Language Models via Fast Detection of Malicious Attempts

  • Boxu Chen
  • , Chaoyi Wang
  • , Le Yang
  • , Ziwei Zheng
  • , Cong Wang
  • , Qian Wang
  • , Chao Shen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Despite the impressive advancements of large vision-language models (LVLMs) in image understanding and reasoning, their susceptibility to safety risks—such as jailbreak attacks—remains a significant challenge for their real-world applications. To address this, we propose a fast yet safe protecting approach, named LVLM-FDA, which detects malicious attempts in inputs by leveraging the internal representations of LVLMs. By examining the representations across different attention heads, we aim to identify the most discriminative malicious features that can be distinguished from benign ones with high generalization accuracy. Therefore, we introduce a metric called separation probability, which provides a lower bound on the generalization accuracy of a classifier tasked with binary classification of malicious features. We can build a detector that identifies potentially harmful content in outputs by selecting the attention heads that generate the representations with the highest separation probability between the malicious and benign inputs. This detector can be seamlessly integrated into the generation process with minimal computational overhead during inference, offering a strong harmful response detector for modern LVLMs. It can be further applied to add an identification prompt to mitigate the safety risks further. Our experiments on various prompt-based attacks show that our method reduces inference time by at least 15% while achieving a better defense performance compared to existing methods, as well as keep the general ability of LVLMs, demonstrating the effectiveness and efficiency of our approach in securing LVLMs. The code for our method is available at https://github.com/Chen-Boxu/LVLM-FDA.

Original languageEnglish
Title of host publicationKnowledge Science, Engineering and Management - 18th International Conference, KSEM 2025, Proceedings
EditorsTianqing Zhu, Wanlei Zhou, Congcong Zhu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages17-34
Number of pages18
ISBN (Print)9789819530007
DOIs
StatePublished - 2026
Event18th International Conference on Knowledge Science, Engineering and Management KSEM 2025 - Macao, China
Duration: 4 Aug 20257 Aug 2025

Publication series

NameLecture Notes in Computer Science
Volume15919 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Knowledge Science, Engineering and Management KSEM 2025
Country/TerritoryChina
CityMacao
Period4/08/257/08/25

Keywords

  • AI security
  • Large vision-language models
  • LVLM safety
  • Separation probability

Fingerprint

Dive into the research topics of 'LVLM-FDA: Protecting Large Vision-Language Models via Fast Detection of Malicious Attempts'. Together they form a unique fingerprint.

Cite this