Suppr超能文献

ChatGPT和Gemini的建议是否符合手部和上肢手术的既定指南?

Do ChatGPT and Gemini's Recommendations Align With Established Guidelines for Hand and Upper Extremity Surgery?

作者信息

Zhang Yibin B, Fischer Fielding S, Abola Matthew V, Osei Daniel A, Wolfe Scott W, Amen Troy B

机构信息

Harvard Medical School, Boston, MA, USA.

Hospital for Special Surgery, New York, NY, USA.

出版信息

Hand (N Y). 2025 Sep 18:15589447251371089. doi: 10.1177/15589447251371089.

Abstract

BACKGROUND

The use of large language models (LLMs) such as ChatGPT and Gemini in clinical settings has surged, presenting potential benefits in reducing administrative workload and enhancing patient communication. However, concerns about the clinical accuracy of these tools persist. This study evaluated the concordance of ChatGPT and Gemini's recommendations with American Academy of Orthopedic Surgeons (AAOS) clinical practice guidelines (CPGs) for carpal tunnel syndrome, distal radius fractures, and glenohumeral joint osteoarthritis.

METHODS

ChatGPT (version 4o) and Gemini (version 1.5 Flash) were queried using structured text-based prompts aligned with AAOS CPGs. The LLMs' outputs were analyzed by blinded reviewers to determine concordance with the guidelines. Concordance rates were compared across models, topics, and guideline strength using descriptive statistics and McNemar's test. The transparency of responses, including source citation, was also assessed.

RESULTS

A total of 174 recommendations were generated, with an overall concordance rate of 62.1%. When comparing concordance rates between LLMs, there was no statistically significant difference between ChatGPT and Gemini (66.7% vs 57.5%, = .131). Concordance varied by topic and guideline strength, with ChatGPT performing best for moderately supported guidelines. Both models demonstrated low citation transparency. Gemini provided sources for 39.1% of recommendations, significantly more than ChatGPT's 3.5% ( < .0001).

CONCLUSIONS

Despite modest concordance rates, both models exhibited significant limitations, including variability across topics and guideline strengths, as well as insufficient citation transparency. These findings highlight the challenges in integrating LLMs into clinical practice and emphasize the need for further refinement and evaluation before adoption in hand surgery.

摘要

背景

ChatGPT和Gemini等大语言模型在临床环境中的使用激增,在减轻行政工作量和加强医患沟通方面具有潜在益处。然而,人们对这些工具的临床准确性仍存在担忧。本研究评估了ChatGPT和Gemini针对腕管综合征、桡骨远端骨折和盂肱关节骨关节炎的建议与美国矫形外科医师学会(AAOS)临床实践指南(CPG)的一致性。

方法

使用与AAOS CPG一致的基于文本的结构化提示对ChatGPT(版本4o)和Gemini(版本1.5 Flash)进行查询。由不知情的评审人员分析大语言模型的输出,以确定与指南的一致性。使用描述性统计和McNemar检验比较不同模型、主题和指南强度的一致性率。还评估了回答的透明度,包括来源引用。

结果

共生成了174条建议,总体一致性率为62.1%。在比较大语言模型之间的一致性率时,ChatGPT和Gemini之间没有统计学上的显著差异(66.7%对57.5%,P = 0.131)。一致性因主题和指南强度而异,ChatGPT在中等支持的指南中表现最佳。两个模型都表现出较低的引用透明度。Gemini为39.1%的建议提供了来源,显著多于ChatGPT的3.5%(P < 0.0001)。

结论

尽管一致性率一般,但两个模型都存在显著局限性,包括不同主题和指南强度之间的差异,以及引用透明度不足。这些发现凸显了将大语言模型整合到临床实践中的挑战,并强调在手部手术中采用之前需要进一步完善和评估。

相似文献

本文引用的文献

4
Large Language Models in Orthopaedics: Definitions, Uses, and Limitations.骨科中的大语言模型:定义、用途及局限性
J Bone Joint Surg Am. 2024 Aug 7;106(15):1411-1418. doi: 10.2106/JBJS.23.01417. Epub 2024 Jun 19.
5
ChatGPT-4 Can Help Hand Surgeons Communicate Better With Patients.ChatGPT-4可帮助手外科医生更好地与患者沟通。
J Hand Surg Glob Online. 2024 Apr 6;6(3):436-438. doi: 10.1016/j.jhsg.2024.03.008. eCollection 2024 May.
7
Quality of ChatGPT Responses to Frequently Asked Questions in Carpal Tunnel Release Surgery.ChatGPT对腕管松解手术常见问题的回答质量。
Plast Reconstr Surg Glob Open. 2024 May 16;12(5):e5822. doi: 10.1097/GOX.0000000000005822. eCollection 2024 May.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验