Benchmarking diffusion models against state-of-the-art architectures for OCT fluid biomarker segmentation.
Katherine Du, Utkarsh Doshi, Benjamin DiCenzo, Jessica Jiang, Ethan Wu, Adarsh Gadari, Sharat Chandra Vupparaboina, Elham Sadeghi, Sandeep Chandra Bollepalli, José-Alain Sahel, Jay Chhablani, Kiran Kumar Vupparaboina
Abstract
Open AccessOBJECTIVES: Retinal diseases, major causes of vision impairment and blindness, are assessed using optical coherence tomography (OCT) scans. Automated report generation for retinal OCT scans, powered by deep learning, can help standardize interpretations and track retinal disease in clinical settings. A key challenge is accurately segmenting retinal disease signatures. This study explores using the diffusion model to segment subretinal fluid (SRF), intraretinal fluid (IRF), and pigment epithelial detachment (PED) in typical clinical settings, comparing their performance to other leading segmentation models. METHODS: We labeled OCT scans and extracted those with specific pathologic retinal features: 269 scans with SRF, 224 scans with IRF, and 114 scans with PED. Three trained reviewers manually segmented these features for downstream analysis. Using manually segmented scans as the ground truth, we trained the diffusion model, Nested U-Net, nnU-Net, TransUNet, and SwinUNet to predict these segmentations. All models were evaluated using 5-fold cross-validation, with performance measured by Dice coefficient, sensitivity, specificity, Pearson correlation coefficient, and R2. RESULTS: All models show high similarly with ground truth segmentations in predicting SRF, IRF, and PED, as shown by the Dice coefficient (Diffusion model: 0.81 ± 0.12, 0.66 ± 0.09, 0.75 ± 0.11). The diffusion model has relatively higher sensitivity compared to most other models, while all models display very high specificity. The Pearson correlation coefficient and R2 values show strongly associated pixel quantification of segmented areas for models, with the nnU-Net model performing the strongest overall. CONCLUSION: This study demonstrates that while diffusion models can comparably segment retinal pathologies using a limited number of manually annotated scans, the nnU-Net model remains the most effective overall for automated OCT analysis.