IEEE transactions on pattern analysis and machine intelligence
Boosting Multi-Modal Large Language Model With Enhanced Visual Features.
Yiwei Ma, Weihuang Lin, Zhibin Wang, Jiayi Ji, Xiaoshuai Sun, Chia-Wen Lin, Rongrong Ji
Published: 202510.1109/TPAMI.2025.3644851
Abstract
Recent advancements in computer vision (CV) and large language models (LLMs) have spurred significant interest in multi-modal large language models (MLLMs), which aim to integrate visual and textual modalities for enhanced understanding and generatio…
Preview only. Read the full abstract at the source