Multi-Center Validation of Artificial Intelligence-Based Video Analysis Platform for Automatic Evaluation of Swallowing Disorders.
Chang-Won Jeong, Dong-Wook Lim, Si-Hyeong Noh, Hee-Kyung Moon, Chul Park, Nayeon Ko, Min-Su Kim
Abstract
Open AccessBackground: Videofluoroscopic swallow study (VFSS) is a key examination for assessing swallowing function. Although several artificial intelligence (AI) models for VFSS interpretation have shown high predictive accuracy through internal validations, AI models that have undergone external validation are rare. This study aims to develop an AI model that automatically diagnoses aspiration and penetration from VFSS videos and to evaluate the model's performance through multicenter external validation. Methods: Among the 2343 VFSS videos collected, 309 cases of Q1-grade videos, which were free of artifacts and clearly showed the airway and vocal cords, were included in the internal validation dataset. The training, internal validation, and test datasets were divided in a 7:1:2 ratio, with 2012 images (aspiration = 532, penetration = 932, no airway invasion = 548) used for training. The AI model was developed and trained using You Only Look Once version 9, model c (YOLOv9_c). External validation of the AI model was conducted using 138 Q1 and Q2-grade VFSS videos from two different hospitals. Results: According to the internal validation, the YOLOv9_c model showed a training accuracy of 98.1%, a validation accuracy of 97.8%, and a test accuracy of 61.5%. From the confusion matrix analysis, the AI model's diagnostic accuracy for aspiration in VFSS videos was 0.76 (AUC = 0.70), and for penetration, the diagnostic accuracy was 0.66 (AUC = 0.65). According to the external validation, the AI model demonstrated good performance in diagnosing aspiration (precision: 90.2%, AUC = 0.79) and penetration (precision: 78.3%, AUC = 0.80). The overall diagnostic accuracy of external validation for VFSS videos was 80.4%. Conclusions: We developed an AI model that automatically diagnoses aspiration and penetration when an entire VFSS video is input, and external validation showed good accuracy. In the future, to improve the performance of this AI model and facilitate its clinical application, research using training and validation with VFSS video data from more hospitals is needed.