Explainable Machine Learning for Heat-Related Illness Prediction: An XGBoost-SHAP Approach Using Korean Meteorological Data.
Chaeyeong Im, Wonji Kim, Heesoo Kim
Abstract
Open AccessThe rising frequency of heat-related illnesses (HRIs) under climate change presents urgent public health challenges, particularly in urban environments. This study develops an explainable machine learning (ML) model to predict HRI risk using metrological data from seven major South Korean metropolitan cities between May and September 2021-2024. We applied eXtreme Gradient Boosting (XGBoost) to model relationships between daily meteorological variables, including maximum and mean daily temperatures, humidity, solar radiation, wind speed, and precipitation, and HRI occurrence. Model performance was validated using 2025 data and demonstrated strong predictive accuracy, with area under the curve (AUC) values 0.895. To enhance interpretability, Shapley Additive exPlanations (SHAP) analysis identified mean daily temperature, solar radiation, and minimum temperature as the strongest contributors to HRI risk. Time-series comparisons of predicted and actual HRI occurrences further validated the model's effectiveness in real-world settings. These findings underscore the potential of eXplainable Artificial Intelligence (XAI) for localized health-risk forecasting and support a data-driven basis for developing early warning systems for climate-sensitive diseases to guide proactive public health planning amid escalating urban heat risks.