IEEE transactions on pattern analysis and machine intelligence

Boosting Multi-Modal Large Language Model With Enhanced Visual Features.

Yiwei Ma, Weihuang Lin, Zhibin Wang, Jiayi Ji, Xiaoshuai Sun, Chia-Wen Lin, Rongrong Ji

Published: 202510.1109/TPAMI.2025.3644851

Abstract

Recent advancements in computer vision (CV) and large language models (LLMs) have spurred significant interest in multi-modal large language models (MLLMs), which aim to integrate visual and textual modalities for enhanced understanding and generatio…

Preview only. Read the full abstract at the source

View at DOI