• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索ChatGPT在骨科环境中的表现及其作为教育工具的潜在用途。

Exploring the Performance of ChatGPT in an Orthopaedic Setting and Its Potential Use as an Educational Tool.

作者信息

Drouaud Arthur, Stocchi Carolina, Tang Justin, Gonsalves Grant, Cheung Zoe, Szatkowski Jan, Forsh David

机构信息

George Washington University School of Medicine, Washington, District of Columbia.

Department of Orthopaedic Surgery, Mount Sinai, New York, New York.

出版信息

JB JS Open Access. 2024 Nov 26;9(4). doi: 10.2106/JBJS.OA.24.00081. eCollection 2024 Oct-Dec.

DOI:10.2106/JBJS.OA.24.00081
PMID:39600798
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11584220/
Abstract

INTRODUCTION

We assessed ChatGPT-4 vision (GPT-4V)'s performance for image interpretation, diagnosis formulation, and patient management capabilities. We aim to shed light on its potential as an educational tool addressing real-life cases for medical students.

METHODS

Ten of the most popular orthopaedic trauma cases from OrthoBullets were selected. GPT-4V interpreted medical imaging and patient information, providing diagnoses, and guiding responses to OrthoBullets questions. Four fellowship-trained orthopaedic trauma surgeons rated GPT-4V responses using a 5-point Likert scale (strongly disagree to strongly agree). Each of GPT-4V's answers was assessed for alignment with current medical knowledge (accuracy), rationale and whether it is logical (rationale), relevancy to the specific case (relevance), and whether surgeons would trust the answers (trustworthiness). Mean scores from surgeon ratings were calculated.

RESULTS

In total, 10 clinical cases, comprising 97 questions, were analyzed (10 imaging, 35 management, and 52 treatment). The surgeons assigned a mean overall rating of 3.46/5.00 to GPT-4V's imaging response (accuracy 3.28, rationale 3.68, relevance 3.75, and trustworthiness 3.15). Management questions received an overall score of 3.76 (accuracy 3.61, rationale 3.84, relevance 4.01, and trustworthiness 3.58), while treatment questions had an average overall score of 4.04 (accuracy 3.99, rationale 4.08, relevance 4.15, and trustworthiness 3.93).

CONCLUSION

This is the first study evaluating GPT-4V's imaging interpretation, personalized management, and treatment approaches as a medical educational tool. Surgeon ratings indicate overall fair agreement in GPT-4V reasoning behind decision-making. GPT-4V performed less favorably in imaging interpretation compared with its management and treatment approach performance. The performance of GPT-4V falls below our fellowship-trained orthopaedic trauma surgeon's standards as a standalone tool for medical education.

摘要

引言

我们评估了ChatGPT-4视觉模型(GPT-4V)在图像解读、诊断制定和患者管理能力方面的表现。我们旨在阐明其作为一种针对医学生实际病例的教育工具的潜力。

方法

从OrthoBullets中挑选出10个最常见的骨科创伤病例。GPT-4V解读医学影像和患者信息,给出诊断,并指导对OrthoBullets问题的回答。四位接受过专科培训的骨科创伤外科医生使用5点李克特量表(从强烈不同意到强烈同意)对GPT-4V的回答进行评分。评估GPT-4V的每个答案与当前医学知识的一致性(准确性)、理由以及是否合乎逻辑(合理性)、与特定病例的相关性(相关性),以及外科医生是否会信任这些答案(可信度)。计算外科医生评分的平均分。

结果

总共分析了10个临床病例,包括97个问题(10个影像问题、35个管理问题和52个治疗问题)。外科医生对GPT-4V影像回答的总体平均评分为3.46/5.00(准确性3.28、合理性3.68、相关性3.75、可信度3.15)。管理问题的总体评分为3.76(准确性3.61、合理性3.84、相关性4.01、可信度3.58),而治疗问题的平均总体评分为4.04(准确性3.99、合理性4.08、相关性4.15、可信度3.93)。

结论

这是第一项评估GPT-4V作为医学教育工具在影像解读、个性化管理和治疗方法方面表现的研究。外科医生的评分表明,对GPT-4V决策背后的推理总体上有相当程度的认同。与管理和治疗方法的表现相比,GPT-4V在影像解读方面的表现较差。作为医学教育的独立工具,GPT-4V的表现低于我们接受过专科培训的骨科创伤外科医生的标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/4a86c80b4d94/jbjsoa-9-e24.00081-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/35c48bfc09f3/jbjsoa-9-e24.00081-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/5eaeb12cad36/jbjsoa-9-e24.00081-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/c9526e58e2a4/jbjsoa-9-e24.00081-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/4a86c80b4d94/jbjsoa-9-e24.00081-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/35c48bfc09f3/jbjsoa-9-e24.00081-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/5eaeb12cad36/jbjsoa-9-e24.00081-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/c9526e58e2a4/jbjsoa-9-e24.00081-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96da/11584220/4a86c80b4d94/jbjsoa-9-e24.00081-g004.jpg

相似文献

1
Exploring the Performance of ChatGPT in an Orthopaedic Setting and Its Potential Use as an Educational Tool.探索ChatGPT在骨科环境中的表现及其作为教育工具的潜在用途。
JB JS Open Access. 2024 Nov 26;9(4). doi: 10.2106/JBJS.OA.24.00081. eCollection 2024 Oct-Dec.
2
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
3
Step into the era of large multimodal models: a pilot study on ChatGPT-4V(ision)'s ability to interpret radiological images.迈入大型多模态模型时代:ChatGPT-4V(ision)解读放射影像能力的初步研究。
Int J Surg. 2024 Jul 1;110(7):4096-4102. doi: 10.1097/JS9.0000000000001359.
4
Hidden Flaws Behind Expert-Level Accuracy of Multimodal GPT-4 Vision in Medicine.医学领域中多模态GPT-4视觉专家级准确性背后的隐藏缺陷。
ArXiv. 2024 Aug 31:arXiv:2401.08396v4.
5
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.医学领域多模态GPT-4视觉专家级准确性背后的隐藏缺陷。
NPJ Digit Med. 2024 Jul 23;7(1):190. doi: 10.1038/s41746-024-01185-7.
6
Evaluating GPT-4V's performance in the Japanese national dental examination: A challenge explored.评估GPT-4V在日本国家牙科考试中的表现:一项探索性挑战。
J Dent Sci. 2024 Jul;19(3):1595-1600. doi: 10.1016/j.jds.2023.12.007. Epub 2023 Dec 22.
7
Assessing GPT-4 multimodal performance in radiological image analysis.评估GPT-4在放射图像分析中的多模态性能。
Eur Radiol. 2025 Apr;35(4):1959-1965. doi: 10.1007/s00330-024-11035-5. Epub 2024 Aug 30.
8
A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?GPT-3.5、GPT-4和GPT-4V之间的比较:大型语言模型(ChatGPT)能通过日本骨科手术委员会考试吗?
Cureus. 2024 Mar 18;16(3):e56402. doi: 10.7759/cureus.56402. eCollection 2024 Mar.
9
Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis.通过GPT-4V眼底图像分析进行青光眼检测与特征识别
Ophthalmol Sci. 2024 Nov 29;5(2):100667. doi: 10.1016/j.xops.2024.100667. eCollection 2025 Mar-Apr.
10
Evaluating the efficacy of few-shot learning for GPT-4Vision in neurodegenerative disease histopathology: A comparative analysis with convolutional neural network model.评估 GPT-4Vision 在神经退行性疾病组织病理学中少样本学习的效果:与卷积神经网络模型的比较分析。
Neuropathol Appl Neurobiol. 2024 Aug;50(4):e12997. doi: 10.1111/nan.12997.