Diagnostic Accuracy of Artificial Intelligence for the Detection of Papilledema on Fundus Images: A Systematic Review and Meta-Analysis.
Samendra Karkhur, Priti Singh, Vidhya Verma, Rama Tulasi Siri Duddumpudi, Arushi Beri, Sumit Satoiya
Abstract
Open AccessPapilledema, a vision-threatening, raised intracranial pressure manifestation, must be recognized quickly to avoid permanent optic nerve injury. AI techniques and deep learning models, while promising for automating fundus-based detection of papilledema, have shown variable diagnostic accuracies across studies because of varied datasets, differences in grading criteria, and validation methodologies. A systematic review and meta-analysis was performed according to PRISMA-DTA (Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies) 2020 guidelines in PubMed, Scopus, Embase, IEEE Xplore, and Google Scholar from January 2021 to August 2025. Studies that compared any AI-based papilledema detection systems on fundus photos with human or imaging reference standards were allowed. Data from the included studies were combined using a bivariate random-effects model to estimate the area under the curve (AUC), sensitivity, and specificity. The risk of bias in the included studies was appraised with the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2). A total of 6 studies with over 15,000 fundus images were included in this review. The pooled sensitivity was 94.6% (95% CI 91.2-97.1). Across six included studies comprising approximately 14,650 fundus images, the pooled diagnostic accuracy of AI-based models for papilledema detection demonstrated a sensitivity of 94.6% (95% CI: 91.2-97.1) and a specificity of 90.3% (95% CI: 86.1-93.5). The pooled area under the summary receiver operating characteristic (SROC) curve was 0.94, indicating excellent discriminative performance. Between-study heterogeneity was moderate (I² = 42%) and was consistently attributed to differences in dataset size, imaging modalities, and reference standards. Deep learning-based models, such as ResNet, DenseNet, and EfficientNet, consistently outperformed conventional machine-learning algorithms. There was moderate heterogeneity (I² = 42%), and publication bias was not significant. AI analysis of fundus images is found to be highly diagnostic and clinically valid in the detection of papilledema, on par with expert opinion. Additional validation across diverse groups and integration of different modalities of data, such as OCT and ultrasound, can help position AI systems as scalable triage platforms within the emergency, neurology, and teleophthalmology departments, thereby increasing access to neuro-ophthalmic services.