Digitization and Linkage of PDF Formatted 12-Lead Electrocardiograms in Adult Congenital Heart Disease.
Muhammet Alkan, Fani Deligianni, Christos Anagnostopoulos, Idris Zakariyya, Gruschen R Veldtman
Abstract
Open AccessBackground: Twelve-lead electrocardiograms (ECGs) form an essential part of the late follow-up of patients with adult congenital heart disease (ACHD). Such ECGs are most frequently reviewed by clinicians in paper or PDF formats. These visual representations of the original vector data do not easily lend themselves to be directly analysed with the increasingly powerful machine learning algorithms that hold promise in risk prediction and early prevention of adverse events. Methods: In this work, we set out to create digital signals from ECG PDF documents by a series of data processing steps, validate accuracy of the process, and demonstrate its potential utility in research. Using 4153 ECG PDF documents from 436 patients with ACHD, we created a "pipeline" to successfully digitize the visually represented ECG vector datasets. We then proceeded with the validation of the digitized ECG dataset using several features that are also calculated by the vendor, such as QRS duration, PR interval, and ventricular rate, on all the patients. Results: We confirmed a strong correlation with the vendor measured ECG parameters including PR interval ( R = 0.941 , P < 0.05 ) , QRS duration ( R = 0.949 , P < 0.05 ) , and ventricular rate ( R = 0.971 , P < 0.05 ) . Further, using support vector machine, a well-established machine learning model, we demonstrate the ability of the digitized ECG dataset to accurately predict anatomic diagnosis in ACHD. Conclusions: Digitization of PDF formatted ECG signal data can be accomplished with good accuracy and can be used in clinical research in ACHD.