• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

骨科手术中大型语言模型的系统评价

Systematic Review on Large Language Models in Orthopaedic Surgery.

作者信息

Mo Kevin, Lin Rowen, Dunn Evan, Girgis Gio, Fang William, Walsh John, Banyai-Flores Nicole, Watson Troy, Lee Daniel

机构信息

Orthopaedic Surgery, Valley Hospital Medical Center, 620 Shadow Ln, Las Vegas, NV 89106, USA.

Touro University Nevada College of Osteopathic Medicine, 874 American Pacific Dr, Henderson, NV 89104, USA.

出版信息

J Clin Med. 2025 Aug 20;14(16):5876. doi: 10.3390/jcm14165876.

DOI:10.3390/jcm14165876
PMID:40869701
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12386971/
Abstract

: Since ChatGPT was released in 2022, many Large Language Models (LLM) have been developed, showing potential to expand the field of orthopaedic surgery. This is the first systematic review looking at the current state of research of LLMs in orthopaedic surgery. The aim of this study is to identify which LLMs are researched, assess their functionalities, and evaluate their quality of results. : The systematic review was conducted using PubMed, Embase, and Cochrane Library databases in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. : A total of 60 studies were included in the final review, all of which included ChatGPT versions 3.0 or 4.0. There were five studies that included Bard and one article each for Perplexity AI and Bing. Most studies assessed orthopaedic assessment questions (23 studies) and their ability to correctly answer free ended questions (31 studies). Outcome measures used to assess the accuracy of LLMs in most of the included studies were the percentage of correct answers on multiple-choice questions or expert-graded consensus to open-ended responses. The accuracy of ChatGPT 4.0 in orthopaedic assessment questions ranged from 47.2 to 73.6% without images, and 35.7-65.85% with images. The accuracy of ChatGPT 3.5 was 29.4-55.8% without images and 22.4-46.34% with images. The accuracy of Bard ranged from 49.8 to 58%. Orthopaedic residents consistently scored better than LLMs in the range of 74.2-75.3%. : ChatGPT 4 showed significant improvement over ChatGPT 3.5 in answering orthopaedic assessment questions. When comparing performances of orthopaedic residents to LLMs, orthopaedic residents scored higher overall. There remains significant opportunity for development of LLM performance on orthopaedic assessments as well as image-based analysis and clinical documentation.

摘要

自2022年ChatGPT发布以来,许多大语言模型(LLM)已被开发出来,显示出在骨科手术领域拓展的潜力。这是第一篇审视骨科手术中LLM研究现状的系统综述。本研究的目的是确定哪些LLM被研究,评估其功能,并评价其结果质量。:根据系统综述和Meta分析的首选报告项目(PRISMA)指南,使用PubMed、Embase和Cochrane图书馆数据库进行系统综述。:最终综述共纳入60项研究,所有研究均涉及ChatGPT 3.0或4.0版本。有5项研究纳入了Bard,还有1篇文章分别涉及Perplexity AI和必应。大多数研究评估了骨科评估问题(23项研究)及其正确回答开放式问题的能力(31项研究)。在大多数纳入研究中,用于评估LLM准确性的结果指标是多项选择题的正确答案百分比或对开放式回答的专家评分共识。ChatGPT 4.0在无图像的骨科评估问题中的准确率为47.2%至73.6%,有图像时为35.7%至65.85%。ChatGPT 3.5无图像时的准确率为29.4%至55.8%,有图像时为22.4%至46.34%。Bard的准确率在49.8%至58%之间。骨科住院医师的得分在74.2%至75.3%范围内始终高于LLM。:ChatGPT 4在回答骨科评估问题方面比ChatGPT 3.5有显著改进。在比较骨科住院医师与LLM的表现时,骨科住院医师的总体得分更高。LLM在骨科评估以及基于图像的分析和临床文档方面的性能仍有很大的发展空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dbf/12386971/7836cfae597b/jcm-14-05876-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dbf/12386971/869217985bc0/jcm-14-05876-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dbf/12386971/ee2cd7e07f28/jcm-14-05876-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dbf/12386971/7836cfae597b/jcm-14-05876-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dbf/12386971/869217985bc0/jcm-14-05876-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dbf/12386971/ee2cd7e07f28/jcm-14-05876-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dbf/12386971/7836cfae597b/jcm-14-05876-g003.jpg

相似文献

1
Systematic Review on Large Language Models in Orthopaedic Surgery.骨科手术中大型语言模型的系统评价
J Clin Med. 2025 Aug 20;14(16):5876. doi: 10.3390/jcm14165876.
2
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.
3
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
4
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
5
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
6
Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.人工智能在减重手术中的表现:ChatGPT-4、Bing 和 Bard 在《美国代谢与减重外科学会减重手术教科书》减重手术问题中的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):609-613. doi: 10.1016/j.soard.2024.04.014. Epub 2024 May 8.
7
Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注:系统评价。
J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.
8
[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].五种大语言模型在口腔辅助诊断、治疗及健康咨询领域的应用初探
Zhonghua Kou Qiang Yi Xue Za Zhi. 2025 Jul 30;60(8):871-878. doi: 10.3760/cma.j.cn112144-20241107-00418.
9
Comparative Analysis of LLMs' Performance On a Practice Radiography Certification Exam.大语言模型在放射实践认证考试中的性能比较分析
Radiol Technol. 2025 May-Jun;96(5):334-342.
10
Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用:ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.

本文引用的文献

1
Clinical and Surgical Applications of Large Language Models: A Systematic Review.大语言模型的临床与外科应用:一项系统综述
J Clin Med. 2024 May 22;13(11):3041. doi: 10.3390/jcm13113041.
2
Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks.ChatGPT-3.5在肿瘤学领域知识的评估:与美国临床肿瘤学会-欧洲肿瘤内科学会基准的比较研究
JMIR AI. 2024 Jan 12;3:e50442. doi: 10.2196/50442.
3
ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.
ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。
JB JS Open Access. 2023 Dec 11;8(4). doi: 10.2106/JBJS.OA.23.00103. eCollection 2023 Oct-Dec.
4
A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?GPT-3.5、GPT-4和GPT-4V之间的比较:大型语言模型(ChatGPT)能通过日本骨科手术委员会考试吗?
Cureus. 2024 Mar 18;16(3):e56402. doi: 10.7759/cureus.56402. eCollection 2024 Mar.
5
Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.生成式人工智能的表现达到了骨科住院医师二年级的水平。
Cureus. 2024 Mar 13;16(3):e56104. doi: 10.7759/cureus.56104. eCollection 2024 Mar.
6
Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features.评估ChatGPT在骨科训练考试中的能力:对新图像处理功能的分析
Cureus. 2024 Mar 11;16(3):e55945. doi: 10.7759/cureus.55945. eCollection 2024 Mar.
7
ChatGPT performance on the American Shoulder and Elbow Surgeons maintenance of certification exam.ChatGPT 在美肩肘外科医生认证考试维护部分的表现。
J Shoulder Elbow Surg. 2024 Sep;33(9):1888-1893. doi: 10.1016/j.jse.2024.02.029. Epub 2024 Apr 4.
8
Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery.大型语言模型在生成脊柱手术抗生素预防临床指南方面的表现。
Neurospine. 2024 Mar;21(1):128-146. doi: 10.14245/ns.2347310.655. Epub 2024 Mar 31.
9
Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions.大型语言模型对放射肿瘤学患者护理问题的回复质量。
JAMA Netw Open. 2024 Apr 1;7(4):e244630. doi: 10.1001/jamanetworkopen.2024.4630.
10
Performance of large language models on benign prostatic hyperplasia frequently asked questions.大语言模型在良性前列腺增生常见问题解答方面的表现。
Prostate. 2024 Jun;84(9):807-813. doi: 10.1002/pros.24699. Epub 2024 Apr 1.