Padding interpolation, median imputation, RobustScalar, and particle swarm optimization with heterogeneous classifiers: a robust combination for effective heart disease diagnosis.
Sanjay Dhanka, Ankur Kumar, Surita Maini, Nitin Kumar, Jeewan Singh, Mudassir Khan, Mohamed Abbas, Amel Ksibi
Abstract
Open AccessIntroduction: Heart disease is a leading cause of death worldwide, necessitating accurate early diagnosis. Although machine learning (ML) shows potential for this task, many current models are hindered by data inconsistencies, poor feature selection, and limited robustness. Methods: This study proposes a novel, robust diagnostic framework. It employs advanced data preprocessing using Padding Interpolation for missing values, Median Imputation for outliers, and RobustScalar for scaling to ensure data integrity. A key innovation is an Improved Particle Swarm Optimization (IPSO) algorithm, enhanced with dynamic inertia weight and a mutation operator to avoid premature convergence. This IPSO performs dual optimization: selecting optimal features and tuning the hyperparameters of five classifiers (Logistic Regression, Linear Discriminant Analysis, Gaussian Naïve Bayes, Support Vector Classifier, and XGBoost). Results: The framework was evaluated on a composite dataset from five public repositories. The proposed IPSO-optimized XGBoost model achieved superior performance at a 90:10 train-test split, with an accuracy of 91.3%, sensitivity of 88.37%, specificity of 93.88%, precision of 92.68%, F1-score of 90.48%, and a Diagnostic Odds Ratio of 116.53. Statistical tests (p < 0.05) confirmed these improvements over baselines were significant. The model also demonstrated consistent generalizability on independent Cleveland and Statlog datasets. Discussion: The results establish that the integrated framework of rigorous preprocessing and the hybrid IPSO optimization-classification model creates a highly effective and generalizable pipeline for automated heart disease diagnosis, addressing key limitations of existing approaches.