Hartman Hayden, Essis Maritza Diane, Tung Wei Shao, Oh Irvin, Peden Sean, Gianakos Arianna L
From the Lincoln Memorial University, DeBusk College of Osteopathic Medicine, Knoxville, TN (Hartman), and the Department of Orthopaedics and Rehabilitation, Yale University, New Haven, CT (Essis, Tung, Oh, Peden, and Gianakos).
J Am Acad Orthop Surg. 2024 Oct 15;33(16):917-923. doi: 10.5435/JAAOS-D-24-00595.
ChatGPT-4, a chatbot with an ability to carry human-like conversation, has attracted attention after demonstrating aptitude to pass professional licensure examinations. The purpose of this study was to explore the diagnostic and decision-making capacities of ChatGPT-4 in clinical management specifically assessing for accuracy in the identification and treatment of soft-tissue foot and ankle pathologies.
This study presented eight soft-tissue-related foot and ankle cases to ChatGPT-4, with each case assessed by three fellowship-trained foot and ankle orthopaedic surgeons. The evaluation system included five criteria within a Likert scale, scoring from 5 (lowest) to 25 (highest possible).
The average sum score of all cases was 22.0. The Morton neuroma case received the highest score (24.7), and the peroneal tendon tear case received the lowest score (16.3). Subgroup analyses of each of the 5 criterion using showed no notable differences in surgeon grading. Criteria 3 (provide alternative treatments) and 4 (provide comprehensive information) were graded markedly lower than criteria 1 (diagnose), 2 (treat), and 5 (provide accurate information) (for both criteria 3 and 4: P = 0.007; P = 0.032; P < 0.0001). Criteria 5 was graded markedly higher than criteria 2, 3, and 4 ( P = 0.02; P < 0.0001; P < 0.0001).
This study demonstrates that ChatGPT-4 effectively diagnosed and provided reliable treatment options for most soft-tissue foot and ankle cases presented, noting consistency among surgeon evaluators. Individual criterion assessment revealed that ChatGPT-4 was most effective in diagnosing and suggesting appropriate treatment, but limitations were seen in the chatbot's ability to provide comprehensive information and alternative treatment options. In addition, the chatbot successfully did not suggest fabricated treatment options, a common concern in prior literature. This resource could be useful for clinicians seeking reliable patient education materials without the fear of inconsistencies, although comprehensive information beyond treatment may be limited.
ChatGPT-4是一款能够进行类人对话的聊天机器人,在展示出通过专业执照考试的能力后受到了关注。本研究的目的是探索ChatGPT-4在临床管理中的诊断和决策能力,具体评估其在识别和治疗足部及踝关节软组织病变方面的准确性。
本研究向ChatGPT-4呈现了8例与足部和踝关节软组织相关的病例,每例病例由三名接受过专科培训的足踝整形外科医生进行评估。评估系统包括李克特量表中的五个标准,评分从5分(最低)到25分(最高)。
所有病例的平均总分是22.0分。 Morton神经瘤病例得分最高(24.7分),腓骨肌腱撕裂病例得分最低(16.3分)。对五个标准中的每一个进行的亚组分析显示,外科医生的评分没有显著差异。标准3(提供替代治疗方法)和标准4(提供全面信息)的评分明显低于标准1(诊断)、标准2(治疗)和标准5(提供准确信息)(标准3和标准4的P值均为:P = 0.007;P = 0.032;P < 0.0001)。标准5的评分明显高于标准2、标准3和标准4(P = 0.02;P < 0.0001;P < 0.0001)。
本研究表明,ChatGPT-4对所呈现的大多数足部和踝关节软组织病例进行了有效诊断并提供了可靠的治疗方案,外科医生评估者之间具有一致性。对各个标准的评估显示,ChatGPT-4在诊断和建议适当治疗方面最为有效,但在提供全面信息和替代治疗方案的能力方面存在局限性。此外,该聊天机器人成功地没有提出虚构的治疗方案,这是先前文献中常见的担忧。尽管治疗以外的全面信息可能有限,但该资源对于寻求可靠患者教育材料且不用担心信息不一致的临床医生可能有用。