Explainable machine learning for preoperative relapse prediction in molecularly stratified endometrial cancer: A single-center finnish cohort study.
Sergio Vela Moreno, Masuma Khatun, Annukka Pasanen, Ralf Bützow, Andres Salumets, Mikko Loukovaara, Vijayachitra Modhukur
Abstract
Open AccessRelapse risk in endometrial carcinoma (EC) is driven by molecular subtype, yet current WHO/ESGO classifications rely on postoperative data, limiting their preoperative use. We developed interpretable machine learning (ML) models to predict relapse timing (none, ≤6 months, >6 months) using exclusively preoperative multimodal data. In a single-center retrospective cohort of 784 EC patients, clinicopathological, molecular, immunohistochemical, and systemic biomarkers were integrated using four feature strategies: Traditional (clinicopathology), ESGO-based (guideline risk groups),TP53 + MMRd (high-risk biology), and POLE (low-risk). Random Forest (RF), Support Vector Machine, k-Nearest Neighbors, Gradient Boosting (GBM) models were trained with leakage-safe preprocessing and evaluated by area under the curve (AUC), accuracy, recall, and F1 score, with interpretability assessed by SHapley Additive exPlanations (SHAP). The RF-Traditional model achieved the best overall performance (F1 = 0.895, AUC = 0.840), while the GBM-POLE model achieved the highest sensitivity (F1 = 0.886, AUC = 0.842). However, prediction of Late Relapse remained challenging (F1 = 0.31) due to class rarity and heterogeneity. Key predictors included ARID1A loss, elevated CA125, thrombocytosis, and p16 expression among key predictors of relapse; while shared high-risk features across models were advanced stage, deep myometrial invasion, elevated CA125, and positive cytology. While multi-center validation is essential, our findings support biologically coherent predictions for individualized preoperative risk stratification, particularly for high-risk molecular subtypes.