IEEE transactions on pattern analysis and machine intelligence
Beyond LLaVA-HD: Diving into High-Resolution Multimodal Large Language Models.
YiFan Zhang, Qingsong Wen, Chaoyou Fu, Kun Wang, Xue Wang, Zhang Zhang, Liang Wang, Rong Jin
Published: 202610.1109/TPAMI.2026.3650761
Abstract
Seeing clearly with high resolution is a foundation of Multimodal Large Language Models (MLLMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where…
Preview only. Read the full abstract at the source