TY - GEN
T1 - From Predictions to Analyses
T2 - 34th ACM Web Conference, WWW 2025
AU - Zheng, Xiaofan
AU - Zeng, Zinan
AU - Wang, Heng
AU - Bai, Yuyang
AU - Liu, Yuhan
AU - Luo, Minnan
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/4/28
Y1 - 2025/4/28
N2 - The rapid development of social media has led to a surge of eye-catching fake news on the Internet, with multimodal news comprising both images and text being particularly prevalent. To address the challenges of Multimodal Fake News Detection (MFND), numerous supervised task-specific Multimodal Small Language Models (MSLMs) have been developed. However, these models lack the breadth of knowledge and the depth of language understanding, which results in unsatisfactory adaptability, generalization, and explainability performance. To address these issues, we attempt to introduce Large Vision-Language Models (LVLMs), aiming to leverage the common sense understanding and logical reasoning abilities of LVLMs for the MFND task. We observed that LVLMs can generate reasonable analyses of news content from specific angles. However, when it comes to synthesizing these analyses for final judgment, their performance declines significantly, failing to meet the accuracy benchmarks set by existing MSLMs detection models. This reflects the need for a more effective way for LVLMs, which have not undergone task-specific training, to utilize their knowledge and capabilities. Based on these findings, we propose the Explainable Adaptive Rationale-Augmented Multimodal (EARAM) framework, which adaptively uses MSLMs to extract useful rationales from the multi-perspective analyses of LVLMs. After making judgments based on these rationales, EARAM then assists LVLMs in generating more reliable explanations. Extensive experiments demonstrate that our model not only achieves state-of-the-art results on widely used datasets but also significantly outperforms other models in terms of generalization and explainability.
AB - The rapid development of social media has led to a surge of eye-catching fake news on the Internet, with multimodal news comprising both images and text being particularly prevalent. To address the challenges of Multimodal Fake News Detection (MFND), numerous supervised task-specific Multimodal Small Language Models (MSLMs) have been developed. However, these models lack the breadth of knowledge and the depth of language understanding, which results in unsatisfactory adaptability, generalization, and explainability performance. To address these issues, we attempt to introduce Large Vision-Language Models (LVLMs), aiming to leverage the common sense understanding and logical reasoning abilities of LVLMs for the MFND task. We observed that LVLMs can generate reasonable analyses of news content from specific angles. However, when it comes to synthesizing these analyses for final judgment, their performance declines significantly, failing to meet the accuracy benchmarks set by existing MSLMs detection models. This reflects the need for a more effective way for LVLMs, which have not undergone task-specific training, to utilize their knowledge and capabilities. Based on these findings, we propose the Explainable Adaptive Rationale-Augmented Multimodal (EARAM) framework, which adaptively uses MSLMs to extract useful rationales from the multi-perspective analyses of LVLMs. After making judgments based on these rationales, EARAM then assists LVLMs in generating more reliable explanations. Extensive experiments demonstrate that our model not only achieves state-of-the-art results on widely used datasets but also significantly outperforms other models in terms of generalization and explainability.
KW - Explainable
KW - Fake News Detection
KW - Large Vision-Language Models
UR - https://www.scopus.com/pages/publications/105005141945
U2 - 10.1145/3696410.3714532
DO - 10.1145/3696410.3714532
M3 - 会议稿件
AN - SCOPUS:105005141945
T3 - WWW 2025 - Proceedings of the ACM Web Conference
SP - 5364
EP - 5375
BT - WWW 2025 - Proceedings of the ACM Web Conference
PB - Association for Computing Machinery, Inc
Y2 - 28 April 2025 through 2 May 2025
ER -