Data-Driven Detection of Subclinical Keratoconus via Semi-Supervised Clustering of Multidimensional Corneal Biomarkers.
Lynn Kandakji, Shafi Balal, Aleksander Stupnicki, Siyin Liu, Marcello Leucci, Dan Gore, Bruce Allan, Nikolas Pontikos
Abstract
Open AccessPurpose: To objectively identify subclinical keratoconus (SKC) from a large sample of healthy and keratoconus (KC) patients via a data-driven framework on corneal imaging data from an anterior-segment OCT (AS-OCT) device (MS-39, CSO Italia). Design: A retrospective cohort study. Subjects: At 2 sites within the Moorfields Eye Hospital network in London, United Kingdom, 25 816 corneal scans from 5005 patients, including 3605 with KC and 1400 healthy control patients, were acquired between 2020 and 2024. Methods: Principal component analysis (PCA) followed by Gaussian mixture modeling (GMM) was applied to AS-OCT-derived data, including 20 KC indices and patient age, to identify SKC eyes, which were then statistically compared against healthy and KC eyes. Subclinical KC eyes were also validated against external systems including same-day Pentacam (Oculus Optikgeräte) scans, Belin-Ambrosio's ABCD system, KC progression criteria determined by a panel of corneal specialists, and the Moorfields Corneal Cross-linking (CXL) Risk Calculator. Main Outcome Measures: Detection of SKC and progression of these eyes to clinically diagnosable KC over time. Results: The GMM identified 166 eyes from 161 patients with distinct structural differences between healthy and KC eyes. These eyes clustered in the morphometric transition zone in PCA space and were predominantly classified as ABCD stage 0. However, they demonstrated asymmetry with their fellow eye, higher predicted CXL risk at 1-4 years (P < 0.001), and faster progression to KC (log-rank P < 0.0001) compared with healthy eyes. Among SKC eyes with longitudinal data, 72.7% met Global Consensus criteria for progression. Conclusions: Subclinical KC remains challenging to detect, and although classic staging such as ABCD retains clinical utility, it is insufficient for early disease detection. Principal component analysis followed by GMM classification on a multidimensional AS-OCT dataset identifies a distinct and high-risk SKC group. This semisupervised framework offers a complementary tool for early risk stratification and can be applied to new patients via projection into the learned PCA space and computation of KC probability. Threshold values corresponding to the 25th and 75th percentiles of KC probability for each parameter may serve as clinical context for flagging eyes when multiple features fall in the atypical range. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.