Variability of ChatGPT in Interpreting the Lexicon of ACR-TIRADS, EU-TIRADS, and K-TIRADS.
Pierpaolo Trimboli, Amos Colombo, Lorenzo Ruinelli, Andrea Leoncini
Abstract
Open AccessBackground: There is an ongoing project to create an international Thyroid Imaging Reporting And Data System (I-TIRADS) to harmonize the terminology of guidelines for reporting thyroid ultrasonography. As artificial intelligence (AI) has been gaining increasing attention also in the thyroid field, achieving solid information about the consistency of AI in interpreting the TIRADS terminology is relevant before the I-TIRADS is published. The present study aimed to examine the issue of AI when interpreting the TIRADS terminology to describe thyroid nodules (TNs). Methods: Three TIRADSs from the USA (ACR-TIRADS), Europe (EU-TIRADS), and Asia (K-TIRADS) were considered. The most popular AI, such as ChatGPT, was tested. All possible combinations of terms of the three TIRADSs were performed. Results: 2592 cases were included. With the ACR-TIRADS lexicon, there was a slightly significant difference between systems (p = 0.0494) which was attributed to variations between ACR- and EU-TIRADS (p = 0.0099). With the EU-TIRADS lexicon, there was a significant difference between systems (p < 0.0001) with a significant result between EU- and ACR-TIRADS (p = 0.0003). Using the K-TIRADS terminology, no significant difference was observed (p = 0.7954). The intraobserver agreement of ChatGPT was moderate with the best values (from 0.55 to 0.60) with the K-TIRADS lexicon. Conclusions: ChatGPT interprets the TIRADS lexicon but with variations when it is asked to assess TNs according to one TIRADS using the terminology of another TIRADS. Clinical operators as well as patients should also be aware of these novel data.