Text-based prediction of ımmunohistochemical biomarkers in breast cancer using a generative large language model: a retrospective study.
Emre Utkan Büyükceran, Ayça Seyfettin, Andelib Babatürk, Zeynep Eskalen, Murat Bulut Özkan, Esin Kaymaz, Hüsnü Hakan Mersin, Fuldem Yıldırım Dönmez
Abstract
Open AccessPurpose: Immunohistochemical (IHC) biomarkers such as estrogen receptor (ER), progesterone receptor (PR), HER2, and Ki-67 are essential for the classification and treatment of breast cancer. While radiomics-based models have demonstrated potential in non-invasive biomarker prediction, the utility of large language models (LLMs) for this task using only textual clinical data remains largely unexplored. This study aimed to evaluate the performance of ChatGPT-4o, a generative LLM, in predicting key IHC biomarkers based solely on structured radiological and pathological reports. Methods: Fifty-five patients with breast cancer were retrospectively analyzed. For each patient, structured clinical, imaging, and pathology reports-excluding IHC data-were entered into ChatGPT-4o. The model was prompted to generate predictions for ER, PR, HER2, and Ki-67 expression. Predictions were repeated at two time points to assess reproducibility. Diagnostic performance was compared to pathology results using accuracy, sensitivity, specificity, and Cohen's kappa. Results: The model achieved the highest accuracy for HER2 prediction (83.6%, κ = 0.51), followed by ER (81.8%, κ = 0.44) and PR (76.4%, κ = 0.39). For high Ki-67 expression, the sensitivity was 88.9% with moderate overall agreement (κ = 0.55). Inter-prediction agreement was substantial to almost perfect for all biomarkers (κ = 0.69-0.83). Conclusion: ChatGPT-4o successfully predicted IHC biomarker status using only structured textual data. Its performance was comparable to radiomics models, offering a feasible and accessible AI tool to support early clinical decision-making, especially in resource-limited settings or before IHC results are available.