Artificial intelligence in electrocardiogram-based prediction of heart failure: a systematic review and meta-analysis.
Shunhong Zhang, Jun Jiang, Yi Luo, Guangyue Liu, Saidi Hu, Siran Wan, Chenchen Luo, Hong Li, Nian Li, LinYong Zhao
Abstract
Open AccessBackground: Heart failure (HF) continues to pose a significant global health challenge, characterized by an increasing prevalence. Early identification of individuals at the highest risk of developing HF and implementing interventions can prevent and delay disease progression. The application of artificial intelligence (AI) to electrocardiograms (ECGs) presents a novel strategy for early prediction; however, the effectiveness and generalizability of this approach necessitate systematic evaluation. Objective: To systematically evaluate the performance of AI models based on ECGS in predicting HF. Methods: This study was registered on PROSPERO (CRD420251012231). Following the PRISMA guidelines, we conducted a systematic literature search across multiple databases, including PubMed, IEEE Xplore, Medline, and Embase, for studies published between 2005 and 2025. The inclusion criteria focused on AI models based on ECGs that reported performance metrics such as the AUROC (Area Under the Receiver Operating Characteristic Curve)/C-statistic. Meta-analysis was performed by employing a random-effects model to evaluate the efficacy of AI in predicting HF through the pooled AUROC/ C-statistic. Additionally, we conducted heterogeneity analyses using I2 and performed subgroup comparisons across various ethnicities, while assessing the risk of bias with the PROBAST + AI tool. Results: A total of five studies involving 11 cohorts and 1,728,134 participants were included in the analysis. The pooled AUROC/C-statistic was found to be 0.76 (95% CI: 0.74-0.78; p < 0.001), indicating moderate-to-good discrimination capability. Subgroup analyses demonstrated consistent performance across different ethnic groups, with AUROC values ranging from 0.77 to 0.79, comparable to the traditional model which had an AUROC of 0.742 (95% CI: 0.692-0.787, P = 0.575). Notably, significant heterogeneity was observed among the studies (I 2 = 89%, p < 0.01), which may be attributed to systematic differences in population characteristics, study design, and data quality. Conclusions: Theoretically, artificial intelligence-enabled electrocardiogram (AI-ECG) models demonstrate promising applicability for predicting HF; however, their effectiveness remains uncertain due to a high risk of bias and a lack of clinical validity studies. Systematic Review Registration: https://www.crd.york.ac.uk/PROSPERO/view/CRD420251012231, PROSPERO CRD420251012231.