Enhancing the prediction of hospital discharge disposition with extraction-based language model classification.
William R Small, Ryan J Crowley, Chloe Pariente, Jeff Zhang, Kevin P Eaton, Lavender Yao Jiang, Eric Oermann, Yindalon Aphinyanaphongs
Abstract
Open AccessEarly identification of inpatient discharges to skilled nursing facilities (SNFs) facilitates care transition planning. Predictive information in admission history and physical notes (H&Ps) is dispersed across long documents. Language models adeptly predict clinical outcomes from text but have limitations: token length constraints, noisy inputs, and opaque outputs. Therefore, we developed extraction-based language model classification (ELC): generative language models distill H&Ps into task-relevant categories ("Structured Extracted Data") before summarizing them into a concise narrative ("AI Risk Snapshot"). We hypothesized that language models utilizing AI Risk Snapshots to predict SNF discharges would perform the best. In this retrospective observational study, nine language models predicted SNF discharges from unstructured predictors (raw H&P text, truncated assessment and plan) and ELC-derived predictors (Structured Extracted Data, AI Risk Snapshots). ELC substantially reduced input length (AI Risk Snapshot median 141 tokens vs raw H&P median 2,120 tokens) and improved average AUROC and AUPRC across models. The best performance was achieved by Bio+Clinical BERT fine-tuned on AI Risk Snapshots (AUROC = .851). AI Risk Snapshots enhanced interpretability by aligning with nurse case managers' risk assessments and facilitating prompt design. Structuring and summarizing H&Ps via ELC thus mitigates the practical limitations of language models and improves SNF discharge prediction.