ChatGPT in general surgery: a cross-sectional study assessing its response to patient questions.
Sebastian Lünse, Eric L Wisotzky, Johannes Höhn, Christoph Paasch, Frank Meyer, Richard Hunger, René Mantke
Abstract
Open AccessBackground: The artificial intelligence-based large language model ChatGPT (OpenAI, San Francisco/CA, USA) has taken human-machine interaction to the next level since its launch in November 2022. As a program that mimics human conversations over a text-based communication interface, ChatGPT has the potential to be used in a variety of healthcare settings, including patient information. The present study aimed to assess the abilities of ChatGPT in responding to patient questions regarding general surgery. Methods: A questionnaire-based cross-sectional study, comprising a total of 30 commonly asked questions by patients about appendicitis, cholecystitis, and inguinal hernia repair, were submitted to ChatGPT (version 3.5) on April 23, 2024. The responses were assessed by experienced surgeons from three German university hospitals using a modified global quality scale (GQS), which comprises a 5-point scale ranging from 1 ("very poor") to 5 ("excellent"). Readability was assessed by using the Flesch Kincaid Reading Ease (FKRE) score and Simple Measure of Gobbledygook (SMOG). Results: The study included 15 participating surgeons. The mean GQS score for ChatGPT-generated responses to patient questions was 4.2 ± 0.88 (mean ± standard deviation), reflecting good quality. The mean FKRE score was 24.3 ± 7.35 and the mean SMOG score was 13.9 ± 1.22, indicating a difficult to very difficult reading level best suited for university graduates. The majority of participating surgeons (n = 9, 60%) indicated that they would use ChatGPT after appropriate further development. Conclusion: The artificial intelligence-based large language model ChatGPT has enormous potential to become a useful tool for providing information to patients about general surgery. However, when carefully considering its limitations, such as low readability for general audience and nonevidence-based sources of information, there must be improvements before future implementation in surgery-related patient interactions.