• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

检查大型语言模型在骨科中的作用:系统评价。

Examining the Role of Large Language Models in Orthopedics: Systematic Review.

机构信息

Department of Orthopaedics, Peking University Third Hospital, Beijing, China.

Engineering Research Center of Bone and Joint Precision Medicine, Ministry of Education, Beijing, China.

出版信息

J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.

DOI:10.2196/59607
PMID:39546795
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11607553/
Abstract

BACKGROUND

Large language models (LLMs) can understand natural language and generate corresponding text, images, and even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, and orthopedic diseases contribute to a significant socioeconomic burden, which could be alleviated by the application of LLMs. Several pioneers in orthopedics have conducted research on LLMs across various subspecialties to explore their performance in addressing different issues. However, there are currently few reviews and summaries of these studies, and a systematic summary of existing research is absent.

OBJECTIVE

The objective of this review was to comprehensively summarize research findings on the application of LLMs in the field of orthopedics and explore the potential opportunities and challenges.

METHODS

PubMed, Embase, and Cochrane Library databases were searched from January 1, 2014, to February 22, 2024, with the language limited to English. The terms, which included variants of "large language model," "generative artificial intelligence," "ChatGPT," and "orthopaedics," were divided into 2 categories: large language model and orthopedics. After completing the search, the study selection process was conducted according to the inclusion and exclusion criteria. The quality of the included studies was assessed using the revised Cochrane risk-of-bias tool for randomized trials and CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) guidance. Data extraction and synthesis were conducted after the quality assessment.

RESULTS

A total of 68 studies were selected. The application of LLMs in orthopedics involved the fields of clinical practice, education, research, and management. Of these 68 studies, 47 (69%) focused on clinical practice, 12 (18%) addressed orthopedic education, 8 (12%) were related to scientific research, and 1 (1%) pertained to the field of management. Of the 68 studies, only 8 (12%) recruited patients, and only 1 (1%) was a high-quality randomized controlled trial. ChatGPT was the most commonly mentioned LLM tool. There was considerable heterogeneity in the definition, measurement, and evaluation of the LLMs' performance across the different studies. For diagnostic tasks alone, the accuracy ranged from 55% to 93%. When performing disease classification tasks, ChatGPT with GPT-4's accuracy ranged from 2% to 100%. With regard to answering questions in orthopedic examinations, the scores ranged from 45% to 73.6% due to differences in models and test selections.

CONCLUSIONS

LLMs cannot replace orthopedic professionals in the short term. However, using LLMs as copilots could be a potential approach to effectively enhance work efficiency at present. More high-quality clinical trials are needed in the future, aiming to identify optimal applications of LLMs and advance orthopedics toward higher efficiency and precision.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcc2/11607553/38cd4c9ff81e/jmir_v26i1e59607_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcc2/11607553/48221895a890/jmir_v26i1e59607_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcc2/11607553/38cd4c9ff81e/jmir_v26i1e59607_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcc2/11607553/48221895a890/jmir_v26i1e59607_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fcc2/11607553/38cd4c9ff81e/jmir_v26i1e59607_fig2.jpg
摘要

背景

大型语言模型(LLM)能够理解自然语言,并根据提示生成相应的文本、图像,甚至视频,这在医学领域具有巨大的潜力。骨科是医学的一个重要分支,骨科疾病造成了巨大的社会经济负担,而 LLM 的应用可能会缓解这种负担。一些骨科先驱已经在各个亚专业领域对 LLM 进行了研究,以探索其在解决不同问题方面的性能。然而,目前针对这些研究的综述和总结较少,缺乏对现有研究的系统总结。

目的

本综述旨在全面总结 LLM 在骨科领域的应用研究结果,并探讨其潜在的机遇和挑战。

方法

检索了 2014 年 1 月 1 日至 2024 年 2 月 22 日的 PubMed、Embase 和 Cochrane Library 数据库,语言限制为英语。将包括“大型语言模型”、“生成式人工智能”、“ChatGPT”和“骨科”等变体的术语分为 2 类:大型语言模型和骨科。完成搜索后,根据纳入和排除标准进行研究选择过程。使用修订后的 Cochrane 随机试验偏倚风险工具和 CONSORT-AI(临床试验人工智能报告标准)指南评估纳入研究的质量。在质量评估后进行数据提取和综合。

结果

共选择了 68 项研究。LLM 在骨科中的应用涉及临床实践、教育、研究和管理领域。在这 68 项研究中,47 项(69%)侧重于临床实践,12 项(18%)涉及骨科教育,8 项(12%)与科学研究有关,1 项(1%)涉及管理领域。在这 68 项研究中,只有 8 项(12%)招募了患者,只有 1 项(1%)是高质量的随机对照试验。ChatGPT 是最常被提及的 LLM 工具。不同研究之间,LLM 性能的定义、测量和评估存在相当大的异质性。仅在诊断任务方面,准确性范围为 55%至 93%。在进行疾病分类任务时,ChatGPT 与 GPT-4 的准确性范围为 2%至 100%。在回答骨科检查中的问题时,由于模型和测试选择的差异,分数范围为 45%至 73.6%。

结论

在短期内,LLM 不能替代骨科专业人员。然而,目前使用 LLM 作为副驾可能是一种提高工作效率的潜在方法。未来需要更多高质量的临床试验,以确定 LLM 的最佳应用,并推动骨科朝着更高效率和精度的方向发展。

相似文献

1
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.
2
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
3
Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注:系统评价。
J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.
4
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
5
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
6
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
9
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

引用本文的文献

1
Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.用于医学问答集成学习的大语言模型协同作用:设计与评估研究
J Med Internet Res. 2025 Jul 14;27:e70080. doi: 10.2196/70080.
2
[Potential applications of large language models in trauma surgery : Opportunities, risks and perspectives].[大语言模型在创伤外科中的潜在应用:机遇、风险与展望]
Unfallchirurgie (Heidelb). 2025 May 12. doi: 10.1007/s00113-025-01581-y.

本文引用的文献

1
ChatGPT's Performance on the Hand Surgery Self-Assessment Exam: A Critical Analysis.ChatGPT在手外科自我评估考试中的表现:一项批判性分析。
J Hand Surg Glob Online. 2024 Jan 2;6(2):200-205. doi: 10.1016/j.jhsg.2023.11.014. eCollection 2024 Mar.
2
Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions.人工智能与住院医师在上肢骨科培训考试问题上表现的比较。
J Hand Surg Glob Online. 2023 Dec 11;6(2):164-168. doi: 10.1016/j.jhsg.2023.10.013. eCollection 2024 Mar.
3
Large Language Models in Orthopaedics: Definitions, Uses, and Limitations.
骨科中的大语言模型:定义、用途及局限性
J Bone Joint Surg Am. 2024 Aug 7;106(15):1411-1418. doi: 10.2106/JBJS.23.01417. Epub 2024 Jun 19.
4
LLMs in medicine: The need for advanced evaluation systems for disruptive technologies.医学领域的大语言模型:对颠覆性技术先进评估系统的需求。
Innovation (Camb). 2024 Apr 2;5(3):100622. doi: 10.1016/j.xinn.2024.100622. eCollection 2024 May 6.
5
ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。
JB JS Open Access. 2023 Dec 11;8(4). doi: 10.2106/JBJS.OA.23.00103. eCollection 2023 Oct-Dec.
6
ChatGPT Can Offer Satisfactory Responses to Common Patient Questions Regarding Elbow Ulnar Collateral Ligament Reconstruction.ChatGPT能够对有关肘部尺侧副韧带重建的常见患者问题提供令人满意的回答。
Arthrosc Sports Med Rehabil. 2024 Feb 13;6(2):100893. doi: 10.1016/j.asmr.2024.100893. eCollection 2024 Apr.
7
Assessing Ability for ChatGPT to Answer Total Knee Arthroplasty-Related Questions.评估 ChatGPT 回答全膝关节置换术相关问题的能力。
J Arthroplasty. 2024 Aug;39(8):2022-2027. doi: 10.1016/j.arth.2024.02.023. Epub 2024 Feb 14.
8
Human-Written vs AI-Generated Texts in Orthopedic Academic Literature: Comparative Qualitative Analysis.骨科医学术文献中人工撰写文本与人工智能生成文本的比较定性分析
JMIR Form Res. 2024 Feb 16;8:e52164. doi: 10.2196/52164.
9
Is ChatGPT a trusted source of information for total hip and knee arthroplasty patients?ChatGPT 对全髋关节和膝关节置换患者来说是可靠的信息来源吗?
Bone Jt Open. 2024 Feb 15;5(2):139-146. doi: 10.1302/2633-1462.52.BJO-2023-0113.R1.
10
An Artificial Intelligence Chatbot is an Accurate and Useful Online Patient Resource Prior to Total Knee Arthroplasty.人工智能聊天机器人是全膝关节置换术前准确且有用的在线患者资源。
J Arthroplasty. 2024 Aug;39(8S1):S358-S362. doi: 10.1016/j.arth.2024.02.005. Epub 2024 Feb 11.