Robust Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning. — SciRadar