Leveraging Machine Learning and Real-World Data to Predict Chronic Obstructive Pulmonary Disease Exacerbations.
Reynold A Panettieri, Jason Roy, Natalia Gontarczyk Uczkowski, Allison Tyler, Jason Attanucci, Thomas G O'Riordan, Kristin Kahle-Wrobleski
Abstract
Open AccessBackground: Previously, we reported that applying artificial intelligence and natural language processing to electronic health record (EHR) data can identify patients at risk of chronic obstructive pulmonary disease (COPD) exacerbations, based on clinical attributes identified in COPDGene. Purpose: Building on these data and using real-world data, we established a predictive model for identifying patients at risk of COPD exacerbations within 24 months of their initial COPD diagnosis. Methods: Structured and unstructured data were obtained from Epic EHR data. Summary statistics for independent variables, including age, gastroesophageal reflux disease, coronary artery disease, congestive heart failure, cor pulmonale, asthma, dyspnea, smoking status, number of comorbidities, and blood eosinophil counts, were calculated. Bivariate associations with COPD exacerbations were calculated using odds ratios and 95% confidence intervals. A multivariable prediction model using the flexible machine-learning approach, Bayesian Additive Regression Trees (BART), was then developed. Model performance was assessed using receiver operating characteristic (ROC) curves and area under the ROC curve (AUC). Results: Of the 3007 patients with COPD as a primary diagnosis, 886 had a COPD exacerbation within 24 months. In the bivariate logistic regression analyses, strong associations (odds ratio >1.5; P <0.05) existed between COPD exacerbation and cor pulmonale, moderate and severe dyspnea, and number of comorbidities (≥4 vs 0). In the BART model, the predictors that were selected most for the branching-tree analyses were eosinophil count, pack years, and moderate dyspnea (in order of most selected). The AUC derived from our BART model was 0.69. Conclusion: Eosinophil count and dyspnea were identified as important predictors of exacerbations. Our data suggest that active monitoring of eosinophil counts and selected patient-reported experiences of dyspnea may identify patients at risk of exacerbations, enabling clinicians to tailor therapies to improve health outcomes among patients with COPD.