NPJ digital medicine

Reasoning red teaming in healthcare not all paths to a desired outcome are desirable.

Vera Sorin, Panagiotis Korfiatis, Girish N Nadkarni, Eyal Klang

Published: 202510.1038/s41746-025-02104-0

Abstract

Open Access

Chang et al. showed that large language models can produce unsafe or biased outputs even when superficially accurate. We highlight that LLMs can hide harmful reasoning if only final responses are red-teamed. Monitoring intermediate inference steps, especially in ethically charged clinical scenarios, can reveal manipulative or unethical thought processes. We propose systematic testing of ethically sensitive prompts and thorough chain-of-thought analysis to ensure safe, trustworthy deployment in healthcare.

View at DOI