Pregnancy identification method as a source of bias in studies of prenatal exposures using real-world data.
Chase D Latour, Jessie K Edwards, Michele Jonsson Funk, Elizabeth A Suarez, Kim Boggess, Mollie E Wood
Abstract
Open AccessResearchers typically identify pregnancies in healthcare data based on observed outcomes. This approach misses pregnancies that received prenatal care but whose outcomes were not recorded, potentially inducing selection bias in prenatal effect estimates. Alternatively, prenatal encounters can be used to identify pregnancies with unobserved outcomes, but this requires addressing loss to follow-up (LTFU). We simulated 10,000,000 pregnancies and estimated the total effect of treatment on preeclampsia. Across 36 scenarios, we varied the treatment effect on miscarriage and/or preeclampsia; percent LTFU (5% or 20%); and cause of LTFU: (1) measured covariates, (2) unobserved miscarriage, and (3) both. We created analytic samples to address LTFU-observed deliveries, observed deliveries and miscarriages, and all pregnancies-and estimated treatment effects using non-parametric direct standardization. Risk differences (RDs) and risk ratios (RRs) from the samples were similarly biased when LTFU was due to miscarriage (log-transformed RR bias: -0.12-0.33 among observed deliveries; -0.11-0.32 among observed deliveries and miscarriages; and -0.11-0.32 among all pregnancies). When predictors of LTFU were measured, only estimates among all pregnancies were unbiased (-0.27-0.33; -0.29-0.03; and -0.02-0.01, respectively). While including all pregnancies does not prevent bias, it quantifies the extent of selection, enabling direct assessment of its potential impact on findings.