Studies in health technology and informaticsNatural Language ProcessingHumansSemantics
MedPromptEval: A Comprehensive Framework for Systematic Evaluation of Clinical Question Answering Systems.
Al Rahrooh, Anders O Garlid, Panayiotis Petousis, Arthur Funnell, Alex A T Bui
Published: 202510.3233/SHTI251540
Abstract
Clinical deployment of large language models (LLMs) faces critical challenges, including inconsistent prompt performance, variable model behavior, and a lack of standardized evaluation methodologies. We present MedPromptEval, a framework that systema…
Preview only. Read the full abstract at the source