Studies in health technology and informaticsNatural Language ProcessingHumansSemantics

MedPromptEval: A Comprehensive Framework for Systematic Evaluation of Clinical Question Answering Systems.

Al Rahrooh, Anders O Garlid, Panayiotis Petousis, Arthur Funnell, Alex A T Bui

Published: 202510.3233/SHTI251540

Abstract

Clinical deployment of large language models (LLMs) faces critical challenges, including inconsistent prompt performance, variable model behavior, and a lack of standardized evaluation methodologies. We present MedPromptEval, a framework that systema…

Preview only. Read the full abstract at the source

View at DOI