IEEE transactions on pattern analysis and machine intelligence
Towards Accurate Procedure Planning in Instructional Videos: Visual State Generation Helps Task-Selective Diffusion.
Fen Fang, Muli Yang, Min Wu, Yanhua Yang, Qianli Xu, Joo-Hwee Lim, Xulei Yang, Hongyuan Zhu
Published: 202510.1109/TPAMI.2025.3641798
Abstract
Procedure planning in instructional videos entails predicting an action sequence that transitions a given start state to a desired goal state. This task is particularly challenging due to two key sources of uncertainty: limited visual observations an…
Preview only. Read the full abstract at the source