• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能在解决小儿肱骨髁上骨折管理相关问题中的表现

Performance of Artificial Intelligence in Addressing Questions Regarding the Management of Pediatric Supracondylar Humerus Fractures.

作者信息

Milner John D, Quinn Matthew S, Schmitt Phillip, Knebel Ashley, Henstenburg Jeffrey, Nasreddine Adam, Boulos Alexandre R, Schiller Jonathan R, Eberson Craig P, Cruz Aristides I

机构信息

Department of Orthopaedic Surgery, Brown University, Warren Alpert Medical School, Providence, RI, USA.

Division of Sports Medicine, Boston Children's Hospital, Boston, MA, USA.

出版信息

J Pediatr Soc North Am. 2025 Mar 9;11:100164. doi: 10.1016/j.jposna.2025.100164. eCollection 2025 May.

DOI:10.1016/j.jposna.2025.100164
PMID:40432855
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12088213/
Abstract

BACKGROUND

The vast accessibility of artificial intelligence (AI) has enabled its utilization in medicine to improve patient education, augment patient-physician communications, support research efforts, and enhance medical student education. However, there is significant concern that these models may provide responses that are incorrect, biased, or lacking in the required nuance and complexity of best practice clinical decision-making. Currently, there is a paucity of literature comparing the quality and reliability of AI-generated responses. The purpose of this study was to assess the ability of ChatGPT and Gemini to generate reponses to the 2022 American Academy of Orthopaedic Surgeons' (AAOS) current practice guidlines on pediatric supracondylar humerus fractures. We hypothesized that both ChatGPT and Gemini would demonstrate high-quality, evidence-based responses with no significant difference between the models across evaluation criteria.

METHODS

The responses from ChatGPT and Gemini to responses based on the 14 AAOS guidelines were evaluated by seven fellowship-trained pediatric orthopaedic surgeons using a questionnaire to assess five key characteristics on a scale from 1 to 5. The prompts were categorized into nonoperative or preoperative management and diagnosis, surgical timing and technique, and rehabilitation and prevention. Statistical analysis included mean scoring, standard deviation, and two-sided t-tests to compare the performance between ChatGPT and Gemini. Scores were then evaluated for inter-rater reliability.

RESULTS

ChatGPT and Gemini demonstrated consistent performance across the criteria, with high mean scores across all criteria except for evidence-based responses. Mean scores were highest for clarity (ChatGPT: 3.745 ± 0.237, Gemini 4.388 ± 0.154) and lowest for evidence-based responses (ChatGPT: 1.816 ± 0.181, Gemini: 3.765 ± 0.229). There were notable statistically significant differences across all criteria, with Gemini having higher mean scores in each criterion ( < .001). Gemini achieved statistically higher ratings in the relevance ( = .03) and evidence-based ( < .001) criteria. Both large language models (LLMs) performed comparably in the accuracy, clarity, and completeness criteria ( > .05).

CONCLUSIONS

ChatGPT and Gemini produced responses aligned with the 2022 AAOS current guideline practices for pediatric supracondylar humerus fractures. Gemini outperformed ChatGPT across all criteria, with the greatest difference in scores seen in the evidence-based category. This study emphasizes the potential for LLMs, particularly Gemini, to provide pertinent clinical information for managing pediatric supracondylar humerus fractures.

KEY CONCEPTS

(1)The accessibility of artificial intelligence has enabled its utilization in medicine to improve patient education, support research efforts, enhance medical student education, and augment patient-physician communications.(2)There is a significant concern that artificial intelligence may provide responses that are incorrect, biased, or lacking in the required nuance and complexity of best practice clinical decision-making.(3)There is a paucity of literature comparing the quality and reliability of AI-generated responses regarding management of pediatric supracondylar humerus fractures.(4)In our study, both ChatGPT and Gemini produced responses that were well aligned with the AAOS current guideline practices for pediatric supracondylar humerus fractures; however, Gemini outperformed ChatGPT across all criteria, with the greatest difference in scores seen in the evidence-based category.

LEVEL OF EVIDENCE

Level II.

摘要

背景

人工智能(AI)的广泛可及性使其得以在医学领域应用,以改善患者教育、加强医患沟通、支持研究工作并提升医学生教育水平。然而,人们高度担忧这些模型可能给出不正确、有偏差或缺乏最佳实践临床决策所需细微差别和复杂性的回答。目前,比较人工智能生成回答的质量和可靠性的文献匮乏。本研究的目的是评估ChatGPT和Gemini针对2022年美国骨科医师学会(AAOS)关于小儿肱骨髁上骨折的现行实践指南生成回答的能力。我们假设ChatGPT和Gemini都能给出高质量、基于证据的回答,且在各项评估标准上模型之间无显著差异。

方法

七位经过专科培训的小儿骨科外科医生使用一份问卷,对ChatGPT和Gemini基于AAOS的14项指南生成的回答进行评估,以1至5分的量表评估五个关键特征。这些提示被分为非手术或术前管理与诊断、手术时机和技术以及康复与预防。统计分析包括平均评分、标准差以及双侧t检验,以比较ChatGPT和Gemini之间的表现。然后评估评分者间的可靠性。

结果

ChatGPT和Gemini在各项标准上表现一致,除基于证据的回答外,所有标准的平均得分都很高。清晰度方面平均得分最高(ChatGPT:3.745±0.237,Gemini:4.388±0.154),基于证据的回答平均得分最低(ChatGPT:1.816±0.181,Gemini:3.765±0.229)。所有标准上都存在显著的统计学差异,Gemini在每个标准上的平均得分更高(P<0.001)。Gemini在相关性(P=0.03)和基于证据(P<0.001)标准上获得了统计学上更高的评分。两个大语言模型在准确性、清晰度和完整性标准上表现相当(P>0.05)。

结论

ChatGPT和Gemini生成的回答与2022年AAOS关于小儿肱骨髁上骨折的现行指南实践一致。Gemini在所有标准上均优于ChatGPT,在基于证据的类别中得分差异最大。本研究强调了大语言模型,尤其是Gemini,为管理小儿肱骨髁上骨折提供相关临床信息的潜力。

关键概念

(1)人工智能的可及性使其能够在医学中用于改善患者教育、支持研究工作、提升医学生教育并加强医患沟通。(2)人们高度担忧人工智能可能给出不正确、有偏差或缺乏最佳实践临床决策所需细微差别和复杂性的回答。(3)比较人工智能生成的关于小儿肱骨髁上骨折管理的回答的质量和可靠性的文献匮乏。(4)在我们的研究中,ChatGPT和Gemini生成的回答都与AAOS关于小儿肱骨髁上骨折的现行指南实践高度一致;然而,Gemini在所有标准上均优于ChatGPT,在基于证据的类别中得分差异最大。

证据水平

二级。

相似文献

1
Performance of Artificial Intelligence in Addressing Questions Regarding the Management of Pediatric Supracondylar Humerus Fractures.人工智能在解决小儿肱骨髁上骨折管理相关问题中的表现
J Pediatr Soc North Am. 2025 Mar 9;11:100164. doi: 10.1016/j.jposna.2025.100164. eCollection 2025 May.
2
Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared With ChatGPT-4 in Response to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines.人工智能大语言模型助力前交叉韧带重建:与ChatGPT-4相比,Gemini在回应美国矫形外科医师学会临床实践指南时具有更高的清晰度和完整性。
Arthroscopy. 2025 Jun;41(6):2002-2008. doi: 10.1016/j.arthro.2024.09.020. Epub 2024 Sep 21.
3
Pediatric Supracondylar Humerus and Diaphyseal Femur Fractures: A Comparative Analysis of Chat Generative Pretrained Transformer and Google Gemini Recommendations Versus American Academy of Orthopaedic Surgeons Clinical Practice Guidelines.小儿肱骨髁上骨折和股骨干骨折:Chat生成式预训练变换器与谷歌Gemini建议对比美国矫形外科医师学会临床实践指南的分析
J Pediatr Orthop. 2025 Apr 1;45(4):e338-e344. doi: 10.1097/BPO.0000000000002890. Epub 2025 Jan 14.
4
Performance of Artificial Intelligence in Addressing Questions Regarding Management of Osteochondritis Dissecans.人工智能在解决剥脱性骨软骨炎管理相关问题中的表现。
Sports Health. 2025 Apr 1:19417381251326549. doi: 10.1177/19417381251326549.
5
ChatGPT and Google Gemini are Clinically Inadequate in Providing Recommendations on Management of Developmental Dysplasia of the Hip Compared to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines.与美国矫形外科医师学会临床实践指南相比,ChatGPT和谷歌Gemini在提供髋关节发育不良管理建议方面存在临床不足。
J Pediatr Soc North Am. 2024 Dec 9;10:100135. doi: 10.1016/j.jposna.2024.100135. eCollection 2025 Feb.
6
Enhancing responses from large language models with role-playing prompts: a comparative study on answering frequently asked questions about total knee arthroplasty.通过角色扮演提示增强大语言模型的回答:关于全膝关节置换术常见问题解答的比较研究
BMC Med Inform Decis Mak. 2025 May 23;25(1):196. doi: 10.1186/s12911-025-03024-5.
7
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.
8
Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?ChatGPT和Gemini是否能为小儿骨科疾病提供恰当的建议?
J Pediatr Orthop. 2025 Jan 1;45(1):e66-e71. doi: 10.1097/BPO.0000000000002797. Epub 2024 Aug 22.
9
ChatGPT and Gemini Are Not Consistently Concordant With the 2020 American Academy of Orthopaedic Surgeons Clinical Practice Guidelines When Evaluating Rotator Cuff Injury.在评估肩袖损伤时,ChatGPT和Gemini与2020年美国矫形外科医师学会临床实践指南的结果并非始终一致。
Arthroscopy. 2025 Feb 4. doi: 10.1016/j.arthro.2025.01.039.
10
Evidence-Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini.生成式人工智能大语言模型在牙脱位方面基于证据的潜力:ChatGPT与Gemini对比
Dent Traumatol. 2025 Apr;41(2):178-186. doi: 10.1111/edt.12999. Epub 2024 Nov 2.

本文引用的文献

1
Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared With ChatGPT-4 in Response to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines.人工智能大语言模型助力前交叉韧带重建:与ChatGPT-4相比,Gemini在回应美国矫形外科医师学会临床实践指南时具有更高的清晰度和完整性。
Arthroscopy. 2025 Jun;41(6):2002-2008. doi: 10.1016/j.arthro.2024.09.020. Epub 2024 Sep 21.
2
Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery.人工智能在术后护理中的应用:评估大型语言模型在整形外科患者推荐中的作用。
Healthcare (Basel). 2024 May 24;12(11):1083. doi: 10.3390/healthcare12111083.
3
Application of ChatGPT for Orthopedic Surgeries and Patient Care.ChatGPT 在骨科手术和患者护理中的应用。
Clin Orthop Surg. 2024 Jun;16(3):347-356. doi: 10.4055/cios23181. Epub 2024 May 13.
4
The Potential of ChatGPT for High-Quality Information in Patient Education for Sports Surgery.ChatGPT在运动外科患者教育中提供高质量信息的潜力。
Cureus. 2024 Apr 23;16(4):e58874. doi: 10.7759/cureus.58874. eCollection 2024 Apr.
5
Evaluating ChatGPT's Ability to Answer Common Patient Questions Regarding Hip Fracture.评估 ChatGPT 回答常见髋关节骨折患者问题的能力。
J Am Acad Orthop Surg. 2024 Jul 15;32(14):656-659. doi: 10.5435/JAAOS-D-23-00877. Epub 2024 May 14.
6
Evaluating Chat Generative Pre-trained Transformer Responses to Common Pediatric In-toeing Questions.评估聊天生成预训练转换器对常见儿科内八字问题的回答。
J Pediatr Orthop. 2024 Aug 1;44(7):e592-e597. doi: 10.1097/BPO.0000000000002695. Epub 2024 Apr 30.
7
ChatGPT Responses to Common Questions About Slipped Capital Femoral Epiphysis: A Reliable Resource for Parents?ChatGPT 对常见的关于股骨头骨骺滑脱问题的回答:父母的可靠资源?
J Pediatr Orthop. 2024 Jul 1;44(6):353-357. doi: 10.1097/BPO.0000000000002681. Epub 2024 Apr 10.
8
Assessing Ability for ChatGPT to Answer Total Knee Arthroplasty-Related Questions.评估 ChatGPT 回答全膝关节置换术相关问题的能力。
J Arthroplasty. 2024 Aug;39(8):2022-2027. doi: 10.1016/j.arth.2024.02.023. Epub 2024 Feb 14.
9
Large Language Models in Medicine: The Potentials and Pitfalls : A Narrative Review.医学领域的大型语言模型:潜力与陷阱:一篇叙事性综述。
Ann Intern Med. 2024 Feb;177(2):210-220. doi: 10.7326/M23-2772. Epub 2024 Jan 30.
10
ChatGPT and large language models in orthopedics: from education and surgery to research.骨科领域的ChatGPT和大语言模型:从教育、手术到研究
J Exp Orthop. 2023 Dec 1;10(1):128. doi: 10.1186/s40634-023-00700-1.