Generative AI in medical education: feasibility and educational value of LLM-generated clinical cases with MCQs.
Qi Zhang, Zijing Huang, Yuqiang Huang, Geng Wang, Riping Zhang, Jianling Yang, Yinglin Cheng, Binyao Chen, Hongxi Wang, Kunliang Qiu, Haoyu Chen
Abstract
Open AccessOBJECTIVE: To evaluate the feasibility and educational value of employing large language models (LLMs) to generate clinical case scenario with multiple-choice questions (MCQs) for undergraduate medical education. METHODS: Twelve ophthalmology clinical case scenarios with MCQs generated by ChatGPT 4.0 were assessed for quality by eight teachers. High-scoring cases with MCQs were selected for review classes to test students' learning. Student perceptions were collected via in-class and after-class questionnaires using a 5-point Likert scale. RESULTS: The average quality score of the 12 cases with MCQs was 52.33 ± 5.44 (range: 48-54.25; max = 60). There were statistical differences in the teachers' scores for identical clinical cases (F = 16.050, P < 0.001). Among 20 students, 95% agreed AI-generated cases enriched learning resources, 80% reported improved interdisciplinary integration and learning efficiency, while 85% used LLMs for post-class practice but raised concerns about content accuracy and difficulty calibration. CONCLUSION: LLMs like ChatGPT can rapidly generate clinically relevant case scenarios and MCQs under precise prompts, offering a novel tool for educators and learners. However, expert review remains critical to mitigate risks of AI hallucinations (observed in 16.67% of cases, 2/12) and ensure alignment with curricular standards. Key issues included contradictions in imaging descriptions (e.g., inappropriate use of high-frequency ultrasound for chalazion) and diagnostic logic (e.g., inconsistent gonioscopy findings), underscoring the necessity of human oversight to refine content accuracy and educational utility.