Prompt engineering for single-best-answer multiple-choice questions in licensing examinations: a narrative review with a case study involving the Korean Medical Licensing Examination.
Bokyoung Kim, Junseok Kang, Min-Young Kim, Jihyun Ahn
Abstract
Open AccessThe emergence of large language models (LLMs) has generated growing interest in their potential applications for medical assessment and item development. This practice-oriented narrative review examines the potential of LLMs, particularly ChatGPT, for generating and validating single-best-answer multiple-choice questions in health professions licensing examinations, using a Korean Medical Licensing Examination (KMLE)-focused case perspective. We frame LLMs as human-in-the-loop tools rather than replacements for high-stakes testing. Recent applications of LLMs in assessment were reviewed, including prompting strategies such as few-shot, multi-stage, and chain-of-thought methods, as well as retrieval-augmented generation (RAG) to align outputs with exam blueprints. Approaches to enforcing formatting rules, checklist-based self-validation, and iterative refinement were analyzed for their role in supporting item development. Findings indicate that LLMs can perform near passing thresholds on high-stakes exams and assist with grading and feedback tasks. Prompt engineering enhances structural fidelity and clinical plausibility, while human oversight remains critical for accuracy, cultural appropriateness, and psychometric defensibility. The emerging multimodal generation of images, audio, and video suggests the feasibility of new item formats, provided robust validation safeguards are implemented. The most effective approach is a human-in-the-loop workflow that leverages artificial intelligence efficiency while embedding expert judgment, psychometric evaluation, and ethical governance. This practice-oriented roadmap-integrating strategic prompt selection, RAG-based blueprint alignment, rigorous validation gates, and KMLE-specific formatting-offers an implementable and methodologically defensible approach for licensing examinations.