Jurayj Alexander, Nerys-Figueroa Julio, Espinal Emil, Gaudiani Michael A, Baes Travis, Mahylis Jared, Muh Stephanie
From the Department of Orthopaedic Surgery, Henry Ford Hospital, Detroit, MI.
J Am Acad Orthop Surg Glob Res Rev. 2025 Mar 11;9(3). doi: 10.5435/JAAOSGlobal-D-24-00289. eCollection 2025 Mar 1.
To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.
Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.
Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).
Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.
评估ChatGPT(OpenAI)对有关肩袖撕裂的常见患者问题提供准确、恰当且易读回答的能力。
将来自OrthoInfo肩袖撕裂网页的八个问题以两种级别输入ChatGPT:标准级别和六年级阅读级别。五名骨科医生使用李克特量表评估回答的准确性和恰当性,并用弗莱什-金凯德年级水平衡量易读性。结果采用配对学生t检验进行分析。
与六年级水平的回答相比,标准ChatGPT回答在准确性(4.7±0.47对3.6±0.76;P<0.001)和恰当性(4.5±0.57对3.7±0.98;P<0.001)方面得分更高。然而,与OrthoInfo的回答相比,标准ChatGPT回答的准确性(4.7±0.47对5.0±0.0;P = 0.004)和恰当性(4.5±0.57对5.0±0.0;P = 0.016)较低。OrthoInfo的回答在准确性和恰当性方面也明显优于六年级水平的回答(P<0.001)。与OrthoInfo和六年级水平的回答相比,标准回答的弗莱什-金凯德年级水平更高(P<0.001)。
与OrthoInfo的回答相比,标准ChatGPT回答的准确性和恰当性较低,易读性也较差。尽管六年级水平的ChatGPT回答更易读,但在准确性和恰当性方面有所妥协。目前,不建议将ChatGPT作为肩袖撕裂患者信息的独立来源,但可作为骨科医生提供信息的补充。