SHAP-based interpretable machine learning for injury risk prediction in university football players: a multi-dimensional data analysis approach.
Jiacheng Ma, Shengrui Liu, Yuting Pei
Abstract
Open AccessSports injury prediction is crucial for university football player health, yet existing research predominantly focuses on professional athletes and lacks interpretability. Using the Kaggle "University Football Injury Prediction Dataset" (800 Chinese university players), we constructed a comprehensive 18-feature evaluation system across four dimensions: basic information, training factors, physical fitness, and lifestyle habits. We systematically compared 10 machine learning algorithms. The Support Vector Machine achieved optimal performance (95.6% accuracy, 95.7% F1-score, 99.2% ROC-AUC). SHAP interpretability analysis identified stress level (importance: 0.10), sleep duration (0.09), and balance ability (0.08) as key injury risk factors, with psychological stress showing positive correlation and adequate sleep/balance showing protective effects. Notably, lifestyle factors outweighed traditional physical fitness indicators in importance. Despite promising results, this study's single-dataset design and lack of external validation limit generalizability. Prospective validation is essential before clinical deployment. This work demonstrates the feasibility of interpretable injury risk prediction for university athletes, providing a foundation for evidence-based prevention strategies.