Application of machine learning in constructing a diagnostic model for neonatal biliary atresia.
Dingding Wang, Jie Sun, Yuyan Jin, Yanan Zhang, Yong Zhao, Kaiyun Hua, Yichao Gu, Shuangshuang Li, Junmin Liao, Peize Wang, Dayan Sun, Jinshi Huang
Abstract
Open AccessImportance: Early diagnosis of biliary atresia (BA) is important for advancing the Kasai operation time and improving the BA prognosis. Objective: To develop machine learning (ML) models for neonatal BA diagnosis using clinical characteristics and serological data. Methods: Neonates presenting with pathological jaundice between January 1, 2013, and December 31, 2023 were enrolled. Five ML models-logistic regression (LR), random forest (RF), support vector machine classifier (SVC), multilayer perceptron (MLP), and extreme gradient boosting (XGBoost)-were trained using neonatal clinical and laboratory data. The stacking classifier (SC) algorithm was employed to select the best-performing models for constructing the ensemble learning model. Results: This study included 85 patients, 42 of whom were diagnosed with BA. Among the five ML models, XGBoost (area under the receiver operating characteristic curve [AUC] = 1.000; 95% confidence interval [CI]: 1.000-1.000) and the RF (AUC = 1.000; 95% CI: 1.000-1.000) demonstrated better diagnostic performance. All models showed acceptable consistency between the predicted and actual probabilities. The SC model, built on the LR, RF, and XGBoost models, also exhibited a strong generalization ability and diagnostic performance (AUC = 1.000; 95% CI: 1.000-1.000). Key diagnostic predictors included elevated gamma-glutamyl transpeptidase (GGT) (AUC = 0.837; 95% CI: 0.749-0.925), increased platelet (PLT) counts (AUC = 0.728; 95% CI: 0.618-0.838), and acholic stools (AUC = 0.765; 95% CI: 0.663-0.867). An XGBoost-based nomogram was also developed. Interpretation: ML models demonstrate high diagnostic accuracy for BA in neonates, with GGT, PLT counts, and acholic stools as pivotal predictors. This approach may enable earlier BA identification and intervention during the neonatal period.