IEEE transactions on pattern analysis and machine intelligence

Momentor++: Advancing Video Large Language Models With Fine-Grained Long Video Reasoning.

Juncheng Li, Minghe Gao, Xiangnan He, Siliang Tang, Weishi Zheng, Jun Xiao, Meng Wang, Tat-Seng Chua, Yueting Zhuang

Published: 202610.1109/TPAMI.2026.3656169

Abstract

Large Language Models (LLMs) exhibit remarkable proficiency in understanding and managing text-based tasks. Many works try to transfer these capabilities to the video domain, which are referred to as Video-LLMs. However, current Video-LLMs can only g…

Preview only. Read the full abstract at the source

View at DOI