Cureus

Comparative Evaluation of Diagnostic and Management Capabilities of Infiniti AI and ChatGPT-4o in Corneal Diseases.

Abdulaziz Mohammad, Ali Bulbanat, Faisal Aljassar

Published: 202510.7759/cureus.95163

Abstract

Open Access

BACKGROUND: Artificial intelligence (AI), particularly large language models (LLMs), is rapidly transforming medical education and clinical decision support. Ophthalmology, a specialty heavily reliant on pattern recognition, presents a promising domain for LLM integration. While general-purpose models like ChatGPT-4o have demonstrated strong performance in ophthalmic tasks, domain-specific systems such as Infiniti AI, built with a retrieval-augmented generation (RAG) framework, claim advantages by grounding responses in peer-reviewed ophthalmic literature. This study compares ChatGPT-4o (OpenAI, San Francisco, CA, USA) and Infiniti AI (Sinjab Academy, UAE) in corneal disease case scenarios. MATERIALS AND METHODS: Twenty corneal cases were selected from the University of Iowa EyeRounds database, covering infectious, inflammatory, degenerative, developmental, and systemic associations. ChatGPT-4o, Infiniti AI, and a fellowship-trained cornea specialist independently evaluated each case. Diagnostic and management responses were scored against American Academy of Ophthalmology preferred practice pattern guidelines using a four-point scale (0-3). Statistical comparisons were performed using paired t-tests and Wilcoxon signed-rank tests. RESULTS: ChatGPT-4o significantly outperformed Infiniti AI across all categories. Diagnostic accuracy was higher for ChatGPT-4o (2.37 ± 0.81) than Infiniti AI (1.13 ± 0.71, p < 0.001, Cohen's d = 1.35). Management scores were also superior (2.65 ± 0.65 vs 1.98 ± 0.65, p < 0.001, d = 1.37). Overall, ChatGPT-4o achieved a mean total score of 5.00 ± 1.22 compared with 3.10 ± 1.10 for Infiniti AI (p < 0.001, d = 1.75). CONCLUSIONS: ChatGPT-4o demonstrated greater diagnostic and management accuracy than Infiniti AI in corneal disease scenarios, highlighting the current strength of general-purpose LLMs over specialized retrieval-based systems. Nonetheless, both models remain prone to hallucinations and should serve as adjuncts to, rather than replacements for, expert judgment. Further refinement of ophthalmology-specific models is warranted to improve safety and clinical utility.