IEEE transactions on pattern analysis and machine intelligence

Beyond LLaVA-HD: Diving into High-Resolution Multimodal Large Language Models.

YiFan Zhang, Qingsong Wen, Chaoyou Fu, Kun Wang, Xue Wang, Zhang Zhang, Liang Wang, Rong Jin

Published: 202610.1109/TPAMI.2026.3650761

Abstract

Seeing clearly with high resolution is a foundation of Multimodal Large Language Models (MLLMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where…

Preview only. Read the full abstract at the source

View at DOI