Application of machine learning in the diagnostic work-up of telomere biology disorders.
Erika Massaccesi, Luca Arcuri, Giacomo Cavalca, Fabian Beier, Lucia Vankann, Michela Lupia, Davide Cangelosi, Alice Grossi, Marina Lanciotti, Filomena Pierri, Francesca Fioredda, Maurizio Miano, Gianluca Dell'Orso, Maria Carla Giarratana, Daniela Guardo
Abstract
Open AccessWe applied supervised and unsupervised machine learning (ML) analyses to a cohort of 140 patients referred to the Hematology Unit of the G. Gaslini Institute from 1989 to 2023 for persistent cytopenia and/or features suggestive of telomere biology disorders (TBDs). Patients were labeled as "TBD" (n = 20, established molecular diagnosis of TBD), "other diagnosis" (OD, n = 27, established molecular diagnosis of congenital disease including marrow failures), and "undefined diagnosis" (UD, n = 93, no established molecular diagnosis). After training a random forest model on 47 patients with established molecular diagnosis (20 TBD and 27 OD), supervised analysis was applied to the UD group and predicted 16/93 patients as having potential TBD and 77/93 subjects with potential OD, accounting for 17.2% and 82.7% of possibly reallocated diagnoses, respectively. The unsupervised approach applied to the whole cohort (n = 140) identified 4 distinct clusters to be significantly associated (P = 0.000001) with 47 molecular diagnoses, with TBD patients prevailing in Clusters 1 and 2 and OD patients in Clusters 3 and 4. Telomere length (TL) and mucocutaneous abnormalities were the most relevant drivers in discriminating between the TBD and OD groups in supervised and unsupervised analyses; they prevailed in Clusters 1 and 2. Interestingly, both analyses yielded similar results in the UD group, where all 16/93 patients without molecular diagnosis predicted to have TBD in the supervised approach were placed in "TBD clusters" 1-2 of the unsupervised analysis. This model might correctly reallocate a remarkable proportion of undefined or previously misclassified cases, thus potentially leading to substantially improved diagnostic work-up of rare and challenging diseases like TBD.