SmartBuildSim: An Open-Source Synthetic-Twin Framework for Reproducible AI Benchmarking in Smart-Building Analytics.
Tymoteusz Miller, Irmina Durlik, Agnieszka Nowy, Ewelina Kostecka
Abstract
Open AccessThis paper introduces SmartBuildSim, an open-source synthetic-twin framework that generates configurable and reproducible multi-sensor building streams using lightweight statistical models with tunable trend, seasonality, correlation, delays, and anomaly mechanisms. Deterministic seeding ensures experiment-level reproducibility, while modular pipelines support unified evaluation across forecasting, anomaly detection, and RL tasks. A comprehensive validation against an ASHRAE Great Energy Predictor III reference signal demonstrates that the synthetic data capture realistic magnitude and variability (KS ≈ 0.32; DTW ≈ 9.69), while preserving interpretable and controllable temporal structure. Benchmark results show that simple linear models achieve strong forecasting performance (RMSE ≈ 21.27), IsolationForest reliably outperforms LOF in anomaly detection (F1 ≈ 0.17 vs. 0.10), and Soft-Q Learning provides substantially more stable RL convergence than tabular Q-learning (variance reduced by >95%). Scenario-level analyses further illustrate reproducible daily cycles, zone-specific differences, and the scalability of model behaviour across building configurations. By combining declarative YAML configurations, deterministic randomness management, and an extensible scenario engine, SmartBuildSim provides a transparent and lightweight alternative to high-fidelity building simulators. The framework offers a practical, reproducible testbed for smart-building AI research, bridging the gap between simplistic synthetic datasets and complex physical digital twins. All code, tables, figures, and a Google Colab workflow are openly available to ensure full replicability.