Cureus

Artificial Intelligence-Powered Interpretation of Corneal Epithelial Maps: A Comparative Pilot Study of ChatGPT, Google Gemini, and Microsoft Bing.

Ruchi Shukla, Aparajita Shukla, Ashutosh K Mishra, Pragati Garg, Nilakshi Banerjee, Swarastra P Singh, Shrinkhal

Published: 202510.7759/cureus.90779

Abstract

Open Access

Background This study aimed to compare the diagnostic interpretation accuracy and clinical suitability of three generative artificial intelligence (AI) models, i.e., ChatGPT 4.0, Google Gemini, and Microsoft Bing, in analyzing corneal epithelial thickness (CET) data across key ocular surface disorders, including keratoconus, vernal keratoconjunctivitis (VKC), and nasal pterygium. Methodology Standardized case scenarios with corresponding CET mapping data were constructed and input into all three AI platforms with the following query: "Evaluate the given CET map and provide the most likely diagnosis and appropriate clinical recommendation." Responses were independently graded by a panel of three ophthalmologists for diagnostic accuracy and clinical appropriateness. Cases were selected based on known CET signature patterns derived from the literature, including doughnut patterns in keratoconus, superior thinning in VKC, and nasal epithelial thickening in nasal pterygium. Results Of the 15 AI-evaluated case scenarios (five each of keratoconus, VKC, and nasal pterygium), ChatGPT showed the highest diagnostic accuracy (80%) and clinical appropriateness (87%). Google Gemini correctly diagnosed 60% and was deemed clinically appropriate in 67%. Microsoft Bing yielded 53% correct diagnoses and 60% appropriate clinical suggestions. Conclusions ChatGPT 4.0 consistently outperformed Google Gemini and Microsoft Bing in the context of CET interpretation for common ocular surface diseases. These findings suggest that ChatGPT may serve as a valuable adjunct in AI-assisted ophthalmology diagnostics, particularly for ocular surface diseases where subtle epithelial remodeling is crucial for early identification. While diagnostic accuracy was the primary outcome, appropriateness of suggested appropriate clinical recommendations aligned with standard protocols in the majority of ChatGPT responses (87%), highlighting its clinical utility.