Leveraging SHapley Additive exPlanations (SHAP) and fuzzy logic for efficient rainfall forecasts.
Seyed Matin Malakouti
Abstract
Open AccessThe precision of rainfall forecasts remains a critical concern for meteorological services, as accurate forecasts enable governments and communities to prepare for floods, droughts, and water scarcity crises. In this study, we propose a hybrid machine learning framework combining a Light Gradient Boosting Machine (LGBM) classifier with a fuzzy logic system to deliver rapid and reliable rainfall forecasts using ten years of daily meteorological data from multiple Australian locations. When predicting "rain tomorrow," the LGBM model achieved an accuracy of 85.42% and an AUC of 0.8818, with an average execution time of 4.678 s per forecast. For "rain today," the model achieved near-perfect performance on held-out validation data, with a mean accuracy of 99.6% (range: 98.8%-100.0%) and an average AUC of 0.998 (range: 0.995-1.000) across 10 fold-cross-validation, completing inference in just 2.98 s per forecast. These results, however, reflect internal validation and may not fully generalize to independent datasets. The fuzzy logic component, which inputs temperature and humidity, produced a likelihood score of 78.53% for given conditions (25 °C, 65% humidity) and matched validation data with 100% accuracy after tuning membership-function parameters. These results demonstrate that our framework not only outperforms conventional classifiers (e.g., Logistic Regression, Decision Trees, Random Forests, and Gradient Boosting) in both accuracy and computational efficiency but also provides interpretable insights via fuzzy rule-based outputs. In practical terms, this approach can be integrated into early-warning systems for urban flood management, agricultural planning, and water-resource allocation, particularly in regions experiencing climate variability. The main contributions of this research include (1) the design of a LGBM and fuzzy system that balances predictive performance with inference speed, (2) a systematic comparison against several baseline algorithms on a large, real-world dataset, and (3) the demonstration of how fuzzy-logic explanations enhance decision-makers' trust in model outputs. However, our study is limited by its reliance on data from Australian meteorological stations; thus, further validation is required to confirm generalizability to other climates and geographic regions. Additionally, while the fuzzy system yields interpretable outputs for temperature-humidity combinations, extending this module to incorporate additional environmental variables (e.g., wind patterns, cloud cover) may further improve robustness. Future work will explore transferability to other weather datasets and the integration of real-time sensor streams for continuous model updating.