IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Boosting Faithful Multi-Modal LLMs via Complementary Visual Grounding.

Zheren Fu, Zhendong Mao, Lei Zhang, Yongdong Zhang

Published: 202510.1109/TIP.2025.3644140

Abstract

Multimodal Large Language Models (MLLMs) exhibit impressive performance across vision-language tasks, but still face the hallucination challenges, where generated texts are factually inconsistent with visual input. Existing mitigation methods focus o…

Preview only. Read the full abstract at the source

View at DOI