Interpretable machine learning of clinical MALDI-TOF spectra discriminates carbapenem-resistant Klebsiella pneumoniae while revealing phylogenetic heterogeneity that limits model generalizability.
Chuangye Cai, Mengxue Zou, Mingxiao Chen, Peibo Yuan, Zhencheng Fang, Lanlan Zhong, Dingqiang Chen, Hongwei Zhou, Nianyi Zeng
Abstract
Open AccessIntroduction: Carbapenem-resistant Klebsiella pneumoniae (CRKP) poses a significant public health threat. Rapid detection of CRKP and its resistance mechanisms is essential for optimizing antibiotic therapy and infection control. However, clinical implementation faces several challenges. Methods: Machine learning classifiers were applied using MALDI-TOF MS data to discriminate KPC-type, NDM-type CRKP, and carbapenem-susceptible strains (CSKP). Model performance was validated across platforms and strain collections. SHapley Additive exPlanations (SHAP) analysis and phylogenetic reconstruction were used to interpret feature contributions and genetic determinants. Results: Significant spectral divergence was observed among K. pneumoniae phenotypes, particularly between KPC and non-KPC strains. Random forest (RF) classifiers demonstrated excellent performance, perfectly discriminating KPC from non-KPC strains (AUC = 1.00) and achieving robust classification between CRKP and CSKP isolates (AUC = 0.809). However, differentiation between NDM and CSKP isolates remained challenging, showing moderate diagnostic reliability (AUC = 0.67-0.87) and inconsistent performance across platforms. Optimization strategies did not yield significant improvements in NDM-CSKP classification, underscoring the minimal spectral differences. SHAP analysis identified the 4521.91 m/z peak as the key feature for KPC classification, whereas NDM strains lacked distinctive spectral features. Phylogenetic analysis revealed that KPC strains formed a distinct cluster, while NDM and CSKP strains were intermixed, emphasizing the difficulty of differentiating them based on MALDI-TOF MS profiles. Conclusion: This study developed models to classify KPC, NDM, and CSKP strains using MALDI-TOF MS combined with machine learning. KPC strains were effectively classified across platforms, whereas NDM and CSKP strains showed limited differentiation due to their close evolutionary relationship. Effective classification requires consideration of regional strain variation and periodic model updates informed by local epidemiology.