Assessment of quality and readability of information provided by ChatGPT in relation to developmental dysplasia of the hip and periacetabular osteotomy.
Vincent J Leopold, Stephen Fahy, Carsten Perka, Jens Goronzy, George Grammatopoulos, Paul E Beaulé, Sebastian Hardt
Abstract
Open AccessThis study evaluates the quality and readability of responses given by ChatGPT 4 relating to common patient queries on Developmental Dysplasia of the Hip (DDH) and Periacetabular Osteotomy (PAO). Frequently asked questions on DDH and PAO were selected from online Patient Education Materials and posed to ChatGPT 4. The responses were evaluated by four high-volume PAO surgeons using a well-established evidence-based rating system, categorizing responses from 'excellent response not requiring clarification' to 'unsatisfactory requiring substantial clarification'. Readability assessments were subsequently conducted to determine the required literacy level to understand the content provided. Responses from ChatGPT 4 varied significantly between preoperative and postoperative queries. In the postoperative category, 50% of responses were rated as 'excellent', showing no need for further clarification, while the preoperative responses frequently required minimal to moderate clarification. The overall median response rating was 'satisfactory requiring minimal clarification'. Readability tests showed that the average Reading Grade Level was 13.44, considerably higher than the recommended sixth-grade level for patient education materials, indicating a substantial barrier to comprehension for the general public. While ChatGPT delivers generally reliable information, the complexity of its language is a major barrier to widespread utilization as a tool for patient education. Future iterations of ChatGPT should aim to utilize more simplistic language, as such enhancing accessibility without compromising content quality.