“AI 医生为您服务”：ChatGPT-4 的治疗建议与骨科临床实践指南如何契合？

"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?

机构信息

Department of Orthopaedic Surgery, The University of Chicago, Chicago, IL, USA.

出版信息

Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

DOI:10.1097/CORR.0000000000003234

PMID:39246048

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11556953/

Abstract

BACKGROUND

Artificial intelligence (AI) is engineered to emulate tasks that have historically required human interaction and intellect, including learning, pattern recognition, decision-making, and problem-solving. Although AI models like ChatGPT-4 have demonstrated satisfactory performance on medical licensing exams, suggesting a potential for supporting medical diagnostics and decision-making, no study of which we are aware has evaluated the ability of these tools to make treatment recommendations when given clinical vignettes and representative medical imaging of common orthopaedic conditions. As AI continues to advance, a thorough understanding of its strengths and limitations is necessary to inform safe and helpful integration into medical practice.

QUESTIONS/PURPOSES: (1) What is the concordance between ChatGPT-4-generated treatment recommendations for common orthopaedic conditions with both the American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines (CPGs) and an orthopaedic attending physician's treatment plan? (2) In what specific areas do the ChatGPT-4-generated treatment recommendations diverge from the AAOS CPGs?

METHODS

Ten common orthopaedic conditions with associated AAOS CPGs were identified: carpal tunnel syndrome, distal radius fracture, glenohumeral joint osteoarthritis, rotator cuff injury, clavicle fracture, hip fracture, hip osteoarthritis, knee osteoarthritis, ACL injury, and acute Achilles rupture. For each condition, the medical records of 10 deidentified patients managed at our facility were used to construct clinical vignettes that each had an isolated, single diagnosis with adequate clarity. The vignettes also encompassed a range of diagnostic severity to evaluate more thoroughly adherence to the treatment guidelines outlined by the AAOS. These clinical vignettes were presented alongside representative radiographic imaging. The model was prompted to provide a single treatment plan recommendation. Each treatment plan was compared with established AAOS CPGs and to the treatment plan documented by the attending orthopaedic surgeon treating the specific patient. Vignettes where ChatGPT-4 recommendations diverged from CPGs were reviewed to identify patterns of error and summarized.

RESULTS

ChatGPT-4 provided treatment recommendations in accordance with the AAOS CPGs in 90% (90 of 100) of clinical vignettes. Concordance between ChatGPT-generated plans and the plan recommended by the treating orthopaedic attending physician was 78% (78 of 100). One hundred percent (30 of 30) of ChatGPT-4 recommendations for fracture vignettes and hip and knee arthritis vignettes matched with CPG recommendations, whereas the model struggled most with recommendations for carpal tunnel syndrome (3 of 10 instances demonstrated discordance). ChatGPT-4 recommendations diverged from AAOS CPGs for three carpal tunnel syndrome vignettes; two ACL injury, rotator cuff injury, and glenohumeral joint osteoarthritis vignettes; as well as one acute Achilles rupture vignette. In these situations, ChatGPT-4 most often struggled to correctly interpret injury severity and progression, incorporate patient factors (such as lifestyle or comorbidities) into decision-making, and recognize a contraindication to surgery.

CONCLUSION

ChatGPT-4 can generate accurate treatment plans aligned with CPGs but can also make mistakes when it is required to integrate multiple patient factors into decision-making and understand disease severity and progression. Physicians must critically assess the full clinical picture when using AI tools to support their decision-making.

CLINICAL RELEVANCE

ChatGPT-4 may be used as an on-demand diagnostic companion, but patient-centered decision-making should continue to remain in the hands of the physician.

摘要

背景

人工智能（AI）被设计用来模拟历史上需要人类交互和智力的任务，包括学习、模式识别、决策和问题解决。尽管像 ChatGPT-4 这样的 AI 模型在医学执照考试中表现出令人满意的性能，表明其有潜力支持医疗诊断和决策，但我们所知的没有研究评估这些工具在给定临床病例和常见骨科疾病的代表性医学影像时，为常见骨科疾病制定治疗建议的能力。随着人工智能的不断发展，有必要深入了解其优势和局限性，以便将其安全且有益地整合到医疗实践中。

问题/目的：(1) ChatGPT-4 为常见骨科疾病生成的治疗建议与美国骨科医师学会（AAOS）临床实践指南（CPGs）和骨科主治医生的治疗方案之间的一致性如何？(2) ChatGPT-4 生成的治疗建议在哪些具体方面与 AAOS CPGs 存在差异？

方法

确定了 10 种常见的骨科疾病，这些疾病都有相关的 AAOS CPGs：腕管综合征、桡骨远端骨折、肩肱关节骨关节炎、肩袖损伤、锁骨骨折、髋部骨折、髋骨关节炎、膝骨关节炎、ACL 损伤和急性跟腱断裂。对于每种疾病，使用我们机构管理的 10 名匿名患者的医疗记录来构建临床病例，这些病例都有一个单独的、单一的诊断，并且有足够的清晰度。这些病例还涵盖了一系列诊断严重程度，以更全面地评估对 AAOS 概述的治疗指南的遵循情况。这些临床病例与代表性的放射影像一起呈现。模型被提示提供单一的治疗方案建议。每个治疗方案都与既定的 AAOS CPGs 进行了比较，并与治疗特定患者的主治骨科医生记录的治疗方案进行了比较。对 ChatGPT-4 建议与 CPGs 存在差异的病例进行了审查，以识别并总结错误模式。

结果

ChatGPT-4 按照 AAOS CPGs 提供了 90%（100 个病例中的 90 个）的治疗建议。ChatGPT-4 生成的方案与主治骨科医生建议的方案之间的一致性为 78%（100 个病例中的 78 个）。ChatGPT-4 对 30 个骨折病例和髋、膝关节骨关节炎病例的建议与 CPG 建议完全一致，而模型在腕管综合征病例（10 个实例中有 3 个显示不一致）方面最具挑战性。ChatGPT-4 对三个腕管综合征病例、两个 ACL 损伤、肩袖损伤和肩肱关节骨关节炎病例以及一个急性跟腱断裂病例的建议与 AAOS CPGs 存在差异。在这些情况下，ChatGPT-4 最常难以正确解释损伤严重程度和进展、将患者因素（如生活方式或合并症）纳入决策过程以及识别手术禁忌症。

结论

ChatGPT-4 可以生成与 CPGs 一致的准确治疗方案，但在需要将多个患者因素纳入决策过程并理解疾病严重程度和进展时也可能会出错。医生在使用 AI 工具支持其决策时，必须批判性地评估完整的临床情况。

临床相关性

ChatGPT-4 可以用作按需诊断伴侣，但以患者为中心的决策仍然应由医生掌握。

相似文献

"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”：ChatGPT-4 的治疗建议与骨科临床实践指南如何契合？

Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Can We Enhance Shared Decision-making for Periacetabular Osteotomy Surgery? A Qualitative Study of Patient Experiences.我们能否加强髋臼周围截骨术的共同决策？一项关于患者体验的定性研究。

Clin Orthop Relat Res. 2025 Jan 1;483(1):120-136. doi: 10.1097/CORR.0000000000003198. Epub 2024 Jul 23.

Artificial Intelligence Shows Limited Success in Improving Readability Levels of Spanish-language Orthopaedic Patient Education Materials.人工智能在提高西班牙语骨科患者教育材料的可读性方面成效有限。

Clin Orthop Relat Res. 2025 Feb 11. doi: 10.1097/CORR.0000000000003413.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理（2025年结石病专家共识）

Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验：对定性文献的系统综述

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

The experience of adults who choose watchful waiting or active surveillance as an approach to medical treatment: a qualitative systematic review.选择观察等待或主动监测作为治疗方法的成年人的经历：一项定性系统评价。

JBI Database System Rev Implement Rep. 2016 Feb;14(2):174-255. doi: 10.11124/jbisrir-2016-2270.

Maternal and neonatal outcomes of elective induction of labor.择期引产的母婴结局

Evid Rep Technol Assess (Full Rep). 2009 Mar(176):1-257.

引用本文的文献

Perceived Accuracy of Spine-Related Medical Advice From ChatGPT, TikTok, and the North American Spine Society Clinical Practice Guidelines.来自ChatGPT、TikTok以及北美脊柱协会临床实践指南的脊柱相关医学建议的感知准确性

Cureus. 2025 Jul 26;17(7):e88808. doi: 10.7759/cureus.88808. eCollection 2025 Jul.

ChatGPT-4 Responses on Ankle Cartilage Surgery Often Diverge from Expert Consensus: A Comparative Analysis.ChatGPT-4对踝关节软骨手术的回答往往与专家共识存在分歧：一项比较分析。

Foot Ankle Orthop. 2025 Aug 13;10(3):24730114251352494. doi: 10.1177/24730114251352494. eCollection 2025 Jul.

Letter to the Editor: "Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?致编辑的信：“现在AI医生来为您诊治”：ChatGPT-4的治疗建议与骨科临床实践指南的契合度如何？

Clin Orthop Relat Res. 2025 May 1;483(5):959. doi: 10.1097/CORR.0000000000003417. Epub 2025 Apr 10.

Reply to the Letter to the Editor: "Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?致编辑的信的回复：“现在AI医生将为您诊治”：ChatGPT-4的治疗建议与骨科临床实践指南的契合度如何？

Clin Orthop Relat Res. 2025 May 1;483(5):960-961. doi: 10.1097/CORR.0000000000003455. Epub 2025 Apr 10.

ChatGPT's Performance in Spinal Metastasis Cases-Can We Discuss Our Complex Cases with ChatGPT?ChatGPT在脊柱转移瘤病例中的表现——我们能与ChatGPT讨论复杂病例吗？

J Clin Med. 2024 Dec 23;13(24):7864. doi: 10.3390/jcm13247864.

Editor's Spotlight/Take 5: "Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?编辑聚焦/五分钟速览：“现在AI医生为您诊治”：ChatGPT-4的治疗建议与骨科临床实践指南的契合度如何？

Clin Orthop Relat Res. 2024 Dec 1;482(12):2094-2097. doi: 10.1097/CORR.0000000000003293. Epub 2024 Oct 30.

CORR Insights®: Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?CORR见解®：来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平？

Clin Orthop Relat Res. 2025 Feb 1;483(2):316-317. doi: 10.1097/CORR.0000000000003269. Epub 2024 Oct 25.

本文引用的文献

2023 American Academy of Orthopaedic Surgeons Management of Osteoarthritis of the Hip Evidence-Based Clinical Practice Guideline: Case Studies.2023年美国矫形外科医师学会髋关节骨关节炎管理循证临床实践指南：案例研究

J Am Acad Orthop Surg. 2025 Feb 15;33(4):e220-e223. doi: 10.5435/JAAOS-D-24-00427. Epub 2024 Sep 18.

Do ChatGPT and Google differ in answers to commonly asked patient questions regarding total shoulder and total elbow arthroplasty?ChatGPT 和谷歌在回答有关全肩和全肘人工关节置换术的常见患者问题方面是否存在差异？

J Shoulder Elbow Surg. 2024 Aug;33(8):e429-e437. doi: 10.1016/j.jse.2023.11.014. Epub 2024 Jan 3.

ChatGPT's Ability to Assist with Clinical Documentation: A Randomized Controlled Trial.ChatGPT 在临床文档中的辅助能力：一项随机对照试验。

J Am Acad Orthop Surg. 2024 Feb 1;32(3):123-129. doi: 10.5435/JAAOS-D-23-00474. Epub 2023 Nov 17.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination.评估ChatGPT在骨科住院医师培训考试中的表现。

JB JS Open Access. 2023 Sep 8;8(3). doi: 10.2106/JBJS.OA.23.00056. eCollection 2023 Jul-Sep.

Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.ChatGPT-3.5、ChatGPT-4 和骨科住院医师在骨科评估考试中的表现比较。

J Am Acad Orthop Surg. 2023 Dec 1;31(23):1173-1179. doi: 10.5435/JAAOS-D-23-00396. Epub 2023 Sep 4.

ChatGPT and its Role in the Decision-Making for the Diagnosis and Treatment of Lumbar Spinal Stenosis: A Comparative Analysis and Narrative Review.ChatGPT及其在腰椎管狭窄症诊断和治疗决策中的作用：一项比较分析与叙述性综述

Global Spine J. 2024 Apr;14(3):998-1017. doi: 10.1177/21925682231195783. Epub 2023 Aug 10.

Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty.评估 ChatGPT 对全髋关节置换术常见患者问题的回答。

J Bone Joint Surg Am. 2023 Oct 4;105(19):1519-1526. doi: 10.2106/JBJS.23.00209. Epub 2023 Jul 17.

New Considerations in ACL Surgery: When Is Anatomic Reconstruction Not Enough?前交叉韧带（ACL）手术的新思考：解剖重建为何还不够？

J Bone Joint Surg Am. 2023 Jul 5;105(13):1026-1035. doi: 10.2106/JBJS.22.01079. Epub 2023 May 19.

ChatGPT passing USMLE shines a spotlight on the flaws of medical education.ChatGPT 通过美国医师执照考试凸显了医学教育的缺陷。

PLOS Digit Health. 2023 Feb 9;2(2):e0000205. doi: 10.1371/journal.pdig.0000205. eCollection 2023 Feb.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验