Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine Center, University of Pittsburgh, Pittsburgh, USA.
Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Göteborgsvägen 31, 431 80, Mölndal, Sweden.
Knee Surg Sports Traumatol Arthrosc. 2023 Nov;31(11):5190-5198. doi: 10.1007/s00167-023-07529-2. Epub 2023 Aug 8.
To investigate the potential use of large language models (LLMs) in orthopaedics by presenting queries pertinent to anterior cruciate ligament (ACL) surgery to generative pre-trained transformer (ChatGPT, specifically using its GPT-4 model of March 14th 2023). Additionally, this study aimed to evaluate the depth of the LLM's knowledge and investigate its adaptability to different user groups. It was hypothesized that the ChatGPT would be able to adapt to different target groups due to its strong language understanding and processing capabilities.
ChatGPT was presented with 20 questions and response was requested for two distinct target audiences: patients and non-orthopaedic medical doctors. Two board-certified orthopaedic sports medicine surgeons and two expert orthopaedic sports medicine surgeons independently evaluated the responses generated by ChatGPT. Mean correctness, completeness, and adaptability to the target audiences (patients and non-orthopaedic medical doctors) were determined. A three-point response scale facilitated nuanced assessment.
ChatGPT exhibited fair accuracy, with average correctness scores of 1.69 and 1.66 (on a scale from 0, incorrect, 1, partially correct, to 2, correct) for patients and medical doctors, respectively. Three of the 20 questions (15.0%) were deemed incorrect by any of the four orthopaedic sports medicine surgeon assessors. Moreover, overall completeness was calculated to be 1.51 and 1.64 for patients and medical doctors, respectively, while overall adaptiveness was determined to be 1.75 and 1.73 for patients and doctors, respectively.
Overall, ChatGPT was successful in generating correct responses in approximately 65% of the cases related to ACL surgery. The findings of this study imply that LLMs offer potential as a supplementary tool for acquiring orthopaedic knowledge. However, although ChatGPT can provide guidance and effectively adapt to diverse target audiences, it cannot supplant the expertise of orthopaedic sports medicine surgeons in diagnostic and treatment planning endeavours due to its limited understanding of orthopaedic domains and its potential for erroneous responses.
V.
通过向生成式预训练转换器(ChatGPT,具体使用其 2023 年 3 月 14 日的 GPT-4 模型)提出与前交叉韧带(ACL)手术相关的查询,来探讨大型语言模型(LLM)在骨科中的潜在应用。此外,本研究旨在评估 LLM 的知识深度,并研究其对不同用户群体的适应性。假设 ChatGPT 由于其强大的语言理解和处理能力,能够适应不同的目标群体。
向 ChatGPT 提出了 20 个问题,并要求其针对两个不同的目标群体(患者和非骨科医生)做出回答。两位经过董事会认证的骨科运动医学外科医生和两位专家级骨科运动医学外科医生独立评估了 ChatGPT 生成的回复。确定了针对目标群体(患者和非骨科医生)的正确性、完整性和适应性的平均值。采用三分制响应量表进行细致的评估。
ChatGPT 的准确性一般,对于患者和医生的平均正确率分别为 1.69 和 1.66(0 为错误,1 为部分正确,2 为正确)。有 3 个问题(15.0%)被任何 4 位骨科运动医学外科医生评估者判定为错误。此外,患者和医生的总体完整性分别计算为 1.51 和 1.64,而患者和医生的总体适应性分别为 1.75 和 1.73。
总体而言,ChatGPT 在大约 65%的 ACL 手术相关问题上成功生成了正确的回复。本研究的结果表明,LLM 作为获取骨科知识的辅助工具具有潜力。然而,尽管 ChatGPT 可以提供指导并有效地适应不同的目标群体,但由于其对骨科领域的理解有限以及可能产生错误回复,它不能替代骨科运动医学外科医生在诊断和治疗计划方面的专业知识。
V。