Feature Selection and Model Optimization for Survival Prediction in Patients with Angina Pectoris.
Róbert Bata, Amr Sayed Ghanem, Attila Csaba Nagy
Abstract
Open AccessBackground: With the rapid emergence of novel survival models and feature selection methods, comparing them with traditional approaches is essential to define contexts of optimal performance. Methods: This study systematically evaluates nine survival models combined with nine feature selection methods for predicting the occurrence of angina pectoris using electronic health record (EHR) data from a Hungarian hospital (n = 29,655, features = 1150). Performance was assessed with the concordance index (C-index) and integrated Brier score (IBS) to compare predictive accuracy across methods. Results: Tree-based survival models, particularly gradient-boosted survival (GBS) and random survival forest (RSF), consistently outperformed conventional approaches in terms of C-index, but showed slightly worse calibration as reflected in their higher IBSs. The best-performing model was RSF, which was optimized using Bayesian hyperparameter tuning. For feature selection, tree-based methods such as Boruta and RSF-based approaches showed superior performance. We further identified clusters of feature selection methods and generated consensus feature sets. We also analyzed the internal relationships between the selected features. Survival model performance was also examined over time using the time-dependent Area Under the Curve (AUC) based on the best-performing feature set. Conclusions: Our findings highlight the substantial impact of recent methodological innovations in survival analysis, which offer significant gains in predictive accuracy and efficiency, ultimately support more robust clinical decision-making in the early identification of angina pectoris among patients with diabetes.