Beyond Reporting: Claude 3.7 Sonnet Accurately Classifies T Stage and Uncovers Omitted Anatomic Invasion in Nasopharyngeal Carcinoma Magnetic Resonance Imaging (MRI) Reports.
Yusuke Asari, Ryo Kurokawa, Akifumi Hagiwara, Mariko Kurokawa, Yuki Sonoda, Jun Kanzawa, Shota Fujisawa, Sousuke Hatano, Wataru Gonoi, Osamu Abe
Abstract
Open AccessPurpose This study aimed to determine whether Claude 3.7 Sonnet (Anthropic, San Francisco, CA, USA), a large language model (LLM), can (i) assign nasopharyngeal carcinoma (NPC) T classification from routine magnetic resonance imaging (MRI) reports and (ii) identify unreported anatomical structures whose invasion would warrant a higher T stage. Materials and methods This single-institution retrospective study included 38 consecutive patients (31 men; mean age 59.7±14.9 years) who underwent pretreatment MRI for NPC between April 1999 and March 2025. De-identified unstructured "Findings" sections were submitted once to Claude 3.7 Sonnet (temperature=0), prompting the model to assign a T stage according to the American Joint Committee on Cancer/Union for International Cancer Control 9th Edition and to list potentially missed invasive sites. Reference-standard staging and relevant omissions were established independently by two radiologists. Model accuracy for T classification and for detecting missing structures was calculated; false-positive flags were recorded. Radiologists re-evaluated MR images for any stage change prompted by the LLM. Results The LLM reproduced the reference T category in 35/38 patients (92.1%). Category-specific accuracy was 100% for T1 (9/9) and T4 (13/13), 90% for T3 (9/10), and 66.7% for T2 (4/6). Among 208 eligible unmentioned structures, the model correctly flagged 81 (38.9%), with a mean of 3.34 false-positive suggestions per case. Subsequent human review confirmed stage upgrades in 2/38 patients (5.3%), both corrected to T4 based on intracranial extension or cranial nerve involvement noted by the LLM. Conclusion Claude 3.7 Sonnet achieved high accuracy in T staging from unstructured free-text MRI reports for NPC and identified clinically important omissions, enabling radiologists to correct staging in select cases. LLM-assisted report auditing may improve staging quality and serve as an educational aid where subspecialty expertise is limited.