NPJ digital medicine
Reasoning red teaming in healthcare not all paths to a desired outcome are desirable.
Vera Sorin, Panagiotis Korfiatis, Girish N Nadkarni, Eyal Klang
Published: 202510.1038/s41746-025-02104-0
Abstract
Open AccessChang et al. showed that large language models can produce unsafe or biased outputs even when superficially accurate. We highlight that LLMs can hide harmful reasoning if only final responses are red-teamed. Monitoring intermediate inference steps, especially in ethically charged clinical scenarios, can reveal manipulative or unethical thought processes. We propose systematic testing of ethically sensitive prompts and thorough chain-of-thought analysis to ensure safe, trustworthy deployment in healthcare.