Evaluating ChatGPT as a Standalone Tool for Patient Education: A Review of Frequently Asked Questions by Patients With Chronic Obstructive Pulmonary Disease.
Yizhu Yin, Zara Riaz, Rafael Amoro Sanchez, Ahmed Mustafa, Ashkan Eighaei Sedeh
Abstract
Open AccessAIM: Chronic obstructive pulmonary disease (COPD) is a leading cause of mortality worldwide. As patient education remains a critical tool in battling chronic conditions, including COPD, a potential role for artificial intelligence (AI) chatbots has taken center stage due to their evolving nature and free availability to the public. However, questions remain on the accuracy and reliability of the information offered by these tools. This study aims to evaluate the accuracy and reproducibility of the responses provided by Chat Generative Pre-trained Transformer (ChatGPT), version 4o (OpenAI, San Francisco, CA), a frequently utilized AI chatbot, to common questions asked by COPD patients. MATERIALS AND METHODS: A set of 44 patient-centered questions regarding COPD was selected and reviewed by physicians experienced in managing pulmonary disorders to ensure quality and relevance. Each question was submitted to ChatGPT (version 4o) three times by study staff, using separate accounts. A majority response, repeated two times or more, was identified per question and was assessed for accuracy and reproducibility by two physicians experienced in COPD management, using a structured rubric. Responses were classed as accurate (complete responses inline with practice guidelines), partially accurate (omitted some information but relayed key points and did not share false information), or inaccurate (false or misleading information). In case of a disagreement in scoring, a consensus review was performed by a third physician. RESULTS: The mean accuracy score was moderate (0.61, SD ± 0.14), with 20.5% (9/44) fully accurate, 79.5% (35/44) partially accurate, and no inaccurate responses, which significantly diverges from the hypothetical perfect score (t=-19.14, p<0.001). However, a high reproducibility score was achieved as responses showed consistency across three iterations, for 93.2% (41/44; 95% CI: 81.4-98.3%) of the questions. CONCLUSION: While ChatGPT shows consistency in responding to COPD-related queries, it cannot yet serve as a standalone patient-education tool. Even though none of the responses were inaccurate, suggesting a potentially safe resource, oversimplification and omitting key information are major limiting factors leading to partial and incomplete responses, raising concerns for potential misinformation. ChatGPT may serve as a supplementary tool in COPD-related disease management, but its utilization must be accompanied by validation from healthcare professionals.