TY - JOUR
T1 - Biomed-DPT
T2 - Dual Modality Prompt Tuning for Biomedical Vision-Language Models
AU - Peng, Wei
AU - Hu, Jianchen
AU - Liu, Kang
AU - Zhang, Meng
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2026
Y1 - 2026
N2 - Prompt learning has emerged as one of the most effective paradigms for adapting pre-trained vision language models (VLMs) to biomedical image classification tasks in few-shot scenarios. However, most existing prompt learning methods rely on a single textual prompt, often ignoring the particular visual structures (e.g., the complex anatomical structures and subtle pathological features) in biomedical images. In this work, we propose Biomed DPT, a knowledge-enhanced dual-modality prompt tuning framework. For text prompts, Biomed-DPT constructs a dual prompt including template-driven ensemble clinical prompts and large language model (LLM)-driven expert domain adapted prompts. These prompts are systematically ranked and their optimal combination is searched for using a neural network. A semantic regularization loss is then applied to extract clinical knowledge while mitigating semantic discrepancies. For visual prompts, Biomed-DPT introduces zero vectors as soft prompts to leverage attention re-weighting so that the focus on non-diagnostic regions and the recognition of non-critical pathological features are avoided. Biomed DPT achieves an average classification accuracy of 66.28% across 11 biomedical image datasets covering 9 modalities and 10 organs, with performance reaching 79.54% in base classes and 76.91% in novel classes.
AB - Prompt learning has emerged as one of the most effective paradigms for adapting pre-trained vision language models (VLMs) to biomedical image classification tasks in few-shot scenarios. However, most existing prompt learning methods rely on a single textual prompt, often ignoring the particular visual structures (e.g., the complex anatomical structures and subtle pathological features) in biomedical images. In this work, we propose Biomed DPT, a knowledge-enhanced dual-modality prompt tuning framework. For text prompts, Biomed-DPT constructs a dual prompt including template-driven ensemble clinical prompts and large language model (LLM)-driven expert domain adapted prompts. These prompts are systematically ranked and their optimal combination is searched for using a neural network. A semantic regularization loss is then applied to extract clinical knowledge while mitigating semantic discrepancies. For visual prompts, Biomed-DPT introduces zero vectors as soft prompts to leverage attention re-weighting so that the focus on non-diagnostic regions and the recognition of non-critical pathological features are avoided. Biomed DPT achieves an average classification accuracy of 66.28% across 11 biomedical image datasets covering 9 modalities and 10 organs, with performance reaching 79.54% in base classes and 76.91% in novel classes.
KW - Dual Modality
KW - Large Language Model
KW - Prompt Learning
KW - Vision Language Model
UR - https://www.scopus.com/pages/publications/105037123221
U2 - 10.1109/JBHI.2026.3686818
DO - 10.1109/JBHI.2026.3686818
M3 - 文章
C2 - 42024940
AN - SCOPUS:105037123221
SN - 2168-2194
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
ER -