IEEE transactions on pattern analysis and machine intelligence
Momentor++: Advancing Video Large Language Models With Fine-Grained Long Video Reasoning.
Juncheng Li, Minghe Gao, Xiangnan He, Siliang Tang, Weishi Zheng, Jun Xiao, Meng Wang, Tat-Seng Chua, Yueting Zhuang
Published: 202610.1109/TPAMI.2026.3656169
Abstract
Large Language Models (LLMs) exhibit remarkable proficiency in understanding and managing text-based tasks. Many works try to transfer these capabilities to the video domain, which are referred to as Video-LLMs. However, current Video-LLMs can only g…
Preview only. Read the full abstract at the source