A practical approach to handling missing sputum samples in airway disease research.
Laurits Frøssing, Susanne Hansen, Morten Hvidtfeldt, Asger Sverrild, Celeste Porsbjerg
Abstract
Open AccessBackground: Missing sputum samples are common in research, hampering the use of paired sputum-based outcome measures. This study aimed to illustrate the effects of different methods for handling missing data. Methods: Data from three intervention studies were pooled. Eosinophil and neutrophil counts were imputed using unconditional median imputation (UCM) and multiple imputation (MI). First, the impact of imputation on the prediction of improvement in lung function and airway hyperreactivity was compared using linear regression. Second, missing data was simulated at different frequencies (20%, 40% and 60%) and the imputed values were compared with the true values. Results: We included 115 patients, of whom 103 had at least one available sputum sample. No significant difference was found between the imputation methods and they had similar effects on the prediction model for improvement in clinical outcomes (n=115) in terms of β, se and significance level with the estimates of effect slightly more conservative using MI. Through simulation, we found an increasing bias with increasing mean for both UCM and MI. UCM consistently provided narrower limits of agreement, but importantly also introduced a systematic proportional bias. The width of the limits of agreement was drastically narrower when only 20% of samples were missing. Conclusion: Imputation increases statistical power and addresses rather than ignores missing data. We suggest that sputum-based outcome measures should be reported using both complete case analyses and with missing data imputed with the choice of imputation method decided based on study design, consequences of over- and under-estimation and ability to address bias.