Data accuracy in the European Cystic Fibrosis Society Patient Registry: results of an on-site data validation project.
Naehrlich Lutz, Fox Alice, Krasnyk Marko, Wollscheid Nadine, Silvia Lorca Mayor, Zolin Anna, Prasad Vibha, ECFS Patient Registry Steering Group
Abstract
Open AccessBACKGROUND: Patient registries are valuable tools for epidemiological research, especially for rare diseases, and a high level of data quality is essential but not always demonstrated. Although crucial, the quality management process in patient registries rarely includes data validation. The European Cystic Fibrosis Society Patient Registry (ECFSPR) collects clinical data about people with cystic fibrosis (pwCF) in Europe (as defined by the World Health Organisation (WHO) European region). This on-site data validation project was conducted by the ECFSPR to assess feasibility of the project, data accuracy and identify areas for improvement. METHODS: From November 2018 to April 2024 the ECFSPR visited centres to validate data on-site, assessing the accuracy and validity of source data for key variables related to demographics, diagnosis, organ transplant and annual disease progression. We compared data submitted to ECFSPR with medical health records (MHR) at participating centres; standardised variable definitions are used for the ECFSPR data. Accuracy (incl. validity) was expressed as the percentage of validated data points that match the MHR. RESULTS: We validated source data on-site in 34 of 40 (85%) participating countries and 133 of 397 (34%) centres, for 4024 pwCF (7.5% of the ECFSPR 2021 dataset). Accuracy was high for demographic data (month and year of birth, sex), transplant (> 99%) and annual clinical data on disease progression (selected infections, medication, complications; >94%). Accuracy for genetic information was 96.6% (where the original genotyping laboratory report was available which was for 85% of all pwCF). Anthropometric measurements and lung function data showed lower accuracy (87-88% of the validated data; this was primarily due to non-adherence to the parameters for selection of the encounter for annual lung function assessment. Data for liver disease were also comparatively less accurate (92%); this may reflect diagnostic heterogeneity. CONCLUSIONS: The ECFSPR on-site data validation project demonstrated its feasibility and confirmed the high accuracy of data for critical variables while also revealing specific areas for targeted quality improvement efforts. CLINICAL TRIAL NUMBER: Not applicable.