IEEE transactions on pattern analysis and machine intelligence

Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering.

Jiangtong Li, Zhaohe Liao, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Haohua Zhao, Li Niu, Guang Chen, Liqing Zhang, Changjun Jiang

Published: 202610.1109/TPAMI.2026.3650864

Abstract

Video Question-Answering (VideoQA) enables machines to interpret and respond to complex video content, advanc ing human-computer interaction. However, existing multimodal large language models (MLLMs) often provide incomplete or opaque explanations a…

Preview only. Read the full abstract at the source

View at DOI