Robust 3D Skeletal Joint Fall Detection in Occluded and Rotated Views Using Data Augmentation and Inference-Time Aggregation.
Maryem Zobi, Lorenzo Bolzani, Youness Tabii, Rachid Oulad Haj Thami
Abstract
Open AccessFall detection systems are a critical application of human pose estimation, frequently struggle with achieving real-world robustness due to their reliance on domain-specific datasets and a limited capacity for generalization to novel conditions. Models trained on controlled, canonical camera views often fail when subjects are viewed from new perspectives or are partially occluded, resulting in missed detections or false positives. This study tackles these limitations by proposing the Viewpoint Invariant Robust Aggregation Graph Convolutional Network (VIRA-GCN), an adaptation of the Richly Activated GCN for fall detection. The VIRA-GCN introduces a novel dual-strategy solution: a synthetic viewpoint generation process to augment training data and an efficient inference-time aggregation method to form consensus-based predictions. We demonstrate that augmenting the Le2i dataset with simulated rotations and occlusions allows a standard pose estimation model to achieve a significant increase in its fall detection capabilities. The VIRA-GCN achieved 99.81% accuracy on the Le2i dataset, confirming its enhanced robustness. Furthermore, the model is suitable for low-resource deployment, utilizing only 4.06 M parameters and achieving a real-time inference latency of 7.50 ms. This work presents a practical and efficient solution for developing a single-camera fall detection system robust to viewpoint variations, and introduces a reusable mapping function to convert Kinect data to the MMPose format, ensuring consistent comparison with state-of-the-art models.