Hybrid artificial intelligence frameworks for otoscopic diagnosis: Integrating convolutional neural networks and large language models toward real-time mobile health.
Yuan-Chia Chu, Yen-Chi Chen, Chien-Yeh Hsu, Chen-Tsung Kuo, Yen-Fu Cheng, Kuan-Hsun Lin, Wen-Huei Liao
Abstract
Open AccessBackground: Otitis media remains a significant global health concern, particularly in resource-limited settings where timely diagnosis is challenging. Artificial intelligence (AI) offers promising solutions to enhance diagnostic accuracy in mobile health applications. Objective: This study introduces a hybrid AI framework that integrates convolutional neural networks (CNNs) for image classification with large language models (LLMs) for clinical reasoning, enabling real-time otoscopic diagnosis. Methods: We developed a dual-path system combining CNN-based feature extraction with LLM-supported interpretation. The framework was optimized for mobile deployment, with lightweight models operating on-device and advanced reasoning performed via secure cloud APIs. A dataset of 10,465 otoendoscopic images (expanded from 2820 original clinical images through data augmentation) across 10 middle-ear conditions was used for training and validation. Diagnostic performance was benchmarked against clinicians of varying expertise. Results: The hybrid CNN-LLM system achieved an overall diagnostic accuracy of 97.6%, demonstrating the synergistic benefit of combining CNN-driven visual analysis with LLM-based clinical reasoning. The system delivered sub-200 ms feedback and achieved specialist-level performance in identifying common ear pathologies. Conclusions: This hybrid AI framework substantially improves diagnostic precision and responsiveness in otoscopic evaluation. Its mobile-friendly design supports scalable deployment in telemedicine and primary care, offering a practical solution to enhance ear disease diagnosis in underserved regions.