A lightweight machine learning tool for Alzheimer's disease prediction.
Vinay Suresh, Tulika Nahar, Arkansh Sharma, Suhrud Panchawagh, Omer Mohammed, Muneeb Ahmad Muneer, Devansh Mishra, Amogh Verma, Vivek Sanker, Ayush Mishra, Hardeep Singh Malhotra, Ravindra Kumar Garg
Abstract
Open AccessINTRODUCTION: Alzheimer's disease (AD) is a progressive neurodegenerative disorder that needs better predictive tools. Using the National Alzheimer's Coordinating Center Uniform Data Set, this study developed machine learning (ML) models and a practical clinical tool for AD prediction. METHODS: Data from 52,537 individuals (22,371 with AD) and more than 200 variables were processed with MissForest imputation and genetic algorithm-based selection. Multiple ML models were trained, and interpretability was performed using SHAP and permutation importance. A LightGBM model was refined through iterative backward feature elimination (IBFE) followed by manual refinement. RESULTS: LightGBM performed best (receiver operating characteristic-area under the curve [ROC-AUC] 0.91, accuracy 82.0%). Key predictors included arthritis, age, body mass index, and heart rate. A 19-feature model retained accuracy (81.2%) and ROC-AUC (0.90). DISCUSSION: This lightweight tool predicts AD using mostly routine variables. Limitations include its cross-sectional nature, and would need external validation. An interactive web app and GitHub resource are available. Highlights: Developed a lightweight ML based tool using 19 routinely available features.The lightweight model achieved an ROC-AUC of 0.90 for Alzheimer's disease prediction on NACC multicenter data.Genetic algorithm, IBFE, and manual refinement enabled optimal feature selection.Tool hosted on an open-access platform for clinical and research use.SHAP analysis provided model interpretability and feature-level insights.