IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Boosting Faithful Multi-Modal LLMs via Complementary Visual Grounding.
Zheren Fu, Zhendong Mao, Lei Zhang, Yongdong Zhang
Published: 202510.1109/TIP.2025.3644140
Abstract
Multimodal Large Language Models (MLLMs) exhibit impressive performance across vision-language tasks, but still face the hallucination challenges, where generated texts are factually inconsistent with visual input. Existing mitigation methods focus o…
Preview only. Read the full abstract at the source