Evaluating ChatGPT 4.0 as a Tool for Nuclear Medicine Board Preparation.
Pierce Herrmann, Kayvon Yazdanbakhsh, Golnaz Lotfian, Keyur Parekh, Sumeet Virmani, Alex Tegeler, Pokhraj P Suthar
Abstract
Open AccessDue to their potential use in medical education, large language models (LLMs), a type of generative artificial intelligence (AI), have become increasingly popular. The accuracy of ChatGPT 4.0 (OpenAI, San Francisco, CA) in responding to multiple-choice questions from a standardized board preparation resource for nuclear medicine certification examinations is assessed in this study. A total of 115 text-based questions from 12 chapters were chosen in total; image-dependent questions were not included because of ChatGPT's restrictions on text-only input. Section-by-section and overall accuracy were calculated by comparing the model's replies to the official answer key. ChatGPT performed the worst in pediatric nuclear medicine (75%), while achieving a total accuracy of 86.95%. It received perfect marks in nuclear cardiology and radiopharmacy. Interestingly, model performance did not correlate with the quantity of questions per chapter. According to these results, ChatGPT might be a useful addition to radiology education; nonetheless, topic-level performance variations and opaque reasoning underscore the need for more research prior to wider educational integration.