Machine learning models based on log odds of positive lymph nodes for predicting survival in T1N+ gastric cancer.
Yuchen Liu, Hao Cui, Zhen Yuan, Jinghang Wang, Ruonan An, Rui Li, Jianxin Cui, Bo Wei
Abstract
Open AccessBackground: Although early gastric cancer (EGC) is generally limited to the mucosal and submucosal layers, lymph node metastasis can still occur, which may worsen the prognosis, particularly when the number of examined lymph nodes (ELNs) is inadequate. This study introduces log odds of positive lymph nodes (LODDS) as a prognostic factor and integrates it with machine learning to improve survival predictions in T1N+ gastric cancer (GC). Methods: This retrospective study used data from the Surveillance, Epidemiology, and End Results (SEER) Program and an independent validation cohort from the Chinese People's Liberation Army General Hospital First Medical Center. Predictive factors were selected using LASSO regression and multivariate Cox regression. Cox proportional-hazards (CoxPH), random survival forest (RSF), and XGBoost models were developed to predict overall survival (OS). Model interpretability and feature importance were evaluated using the SHapley Additive exPlanations (SHAP) method. Results: A total of 419 T1N+ GC patients from the SEER database and 193 from our institution were included in the study. LODDS staging was identified as an independent prognostic factor, demonstrating superior discriminatory power compared to N staging (C-index 0.65 vs. 0.57). Based on the Brier score, area under the ROC curve (AUC), and C-index, the RSF model outperformed both the Cox model and XGBoost model. The RSF model achieved a C-index of 0.79 in the training cohort and 0.80 in the validation cohort, indicating favorable discrimination and calibration, with Brier scores below 0.25. Conclusions: Integrating LODDS staging into the RSF model, alongside other clinical features, provides a highly accurate tool for survival prediction in T1N+ GC patients.