Machine Learning Models for Point-of-Care Diagnostics of Acute Kidney Injury.
Chun-You Chen, Te-I Chang, Cheng-Hsien Chen, Shih-Chang Hsu, Yen-Ling Chu, Nai-Jen Huang, Yuh-Mou Sue, Tso-Hsiao Chen, Feng-Yen Lin, Chun-Ming Shih, Po-Hsun Huang, Hui-Ling Hsieh, Chung-Te Liu
Abstract
Open AccessBackground/Objectives: Computerized diagnostic algorithms could achieve early detection of acute kidney injury (AKI) only with available baseline serum creatinine (SCr). To tackle this weakness, we tried to construct a machine learning model for AKI diagnosis based on point-of-care clinical features regardless of baseline SCr. Methods: Patients with SCr > 1.3 mg/dL were recruited retrospectively from Wan Fang Hospital, Taipei. A Dataset A (n = 2846) was used as the training dataset and a Dataset B (n = 1331) was used as the testing dataset. Point-of-care features, including laboratory data and physical readings, were inputted into machine learning models. The repeated machine learning models randomly used 70% and 30% of Dataset A as training dataset and testing dataset for 1000 rounds, respectively. The single machine learning models used Dataset A as training dataset and Dataset B as testing dataset. A computerized algorithm for AKI diagnosis based on 1.5× increase in SCr and clinician's AKI diagnosis compared to machine learning models. Results: On an independent, unbalanced test set (n = 1331), our machine learning models achieved AUROC values ranging from 0.67 to 0.74. A pre-existing computerized algorithm performed best (AUROC = 0.94). Crucially, all machine learning models significantly outperformed the routine clinician's diagnosis (AUROC ~0.74 vs. 0.53, p < 0.05). For context, a pre-existing computerized algorithm, which requires available baseline SCr data, achieved an AUROC of 0.94 on a relevant subset of the data, highlighting the performance benchmark when baseline data is available. Formal statistical comparisons revealed that the top-performing models (e.g., Random Forest, SVM) were often statistically indistinguishable. Model performance was highly dependent on the test scenario, with precision and F1 scores improving markedly on a balanced dataset. Conclusions: In the absence of baseline SCr, machine learning models can diagnose AKI with significantly greater accuracy than routine clinical diagnoses. Our robust statistical analysis suggests that several advanced algorithms achieve a similarly high level of performance.