跳到主要导航 跳到搜索 跳到主要内容

Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models

  • Xi'an Jiaotong University

科研成果: 期刊稿件文章同行评审

摘要

Prompt learning has emerged as one of the most effective paradigms for adapting pre-trained vision language models (VLMs) to biomedical image classification tasks in few-shot scenarios. However, most existing prompt learning methods rely on a single textual prompt, often ignoring the particular visual structures (e.g., the complex anatomical structures and subtle pathological features) in biomedical images. In this work, we propose Biomed DPT, a knowledge-enhanced dual-modality prompt tuning framework. For text prompts, Biomed-DPT constructs a dual prompt including template-driven ensemble clinical prompts and large language model (LLM)-driven expert domain adapted prompts. These prompts are systematically ranked and their optimal combination is searched for using a neural network. A semantic regularization loss is then applied to extract clinical knowledge while mitigating semantic discrepancies. For visual prompts, Biomed-DPT introduces zero vectors as soft prompts to leverage attention re-weighting so that the focus on non-diagnostic regions and the recognition of non-critical pathological features are avoided. Biomed DPT achieves an average classification accuracy of 66.28% across 11 biomedical image datasets covering 9 modalities and 10 organs, with performance reaching 79.54% in base classes and 76.91% in novel classes.

源语言英语
期刊IEEE Journal of Biomedical and Health Informatics
DOI
出版状态已接受/待刊 - 2026

学术指纹

探究 'Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models' 的科研主题。它们共同构成独一无二的指纹。

引用此