IEEE transactions on pattern analysis and machine intelligence
Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering.
Jiangtong Li, Zhaohe Liao, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Haohua Zhao, Li Niu, Guang Chen, Liqing Zhang, Changjun Jiang
Published: 202610.1109/TPAMI.2026.3650864
Abstract
Video Question-Answering (VideoQA) enables machines to interpret and respond to complex video content, advanc ing human-computer interaction. However, existing multimodal large language models (MLLMs) often provide incomplete or opaque explanations a…
Preview only. Read the full abstract at the source