• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
The Performance of a Customized Generative Pre-trained Transformer on the American Society for Surgery of the Hand Self-Assessment Examination.定制生成式预训练变换器在美国手外科协会自我评估考试中的表现
Cureus. 2024 Sep 25;16(9):e70205. doi: 10.7759/cureus.70205. eCollection 2024 Sep.
2
The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination.ChatGPT在美国手外科协会自我评估考试中的表现。
Cureus. 2024 Apr 24;16(4):e58950. doi: 10.7759/cureus.58950. eCollection 2024 Apr.
3
ChatGPT's Performance on the Hand Surgery Self-Assessment Exam: A Critical Analysis.ChatGPT在手外科自我评估考试中的表现:一项批判性分析。
J Hand Surg Glob Online. 2024 Jan 2;6(2):200-205. doi: 10.1016/j.jhsg.2023.11.014. eCollection 2024 Mar.
4
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.ChatGPT在秘鲁国家医学执照考试中的表现:横断面研究
JMIR Med Educ. 2023 Sep 28;9:e48039. doi: 10.2196/48039.
5
Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.ChatGPT-3.5、ChatGPT-4 和骨科住院医师在骨科评估考试中的表现比较。
J Am Acad Orthop Surg. 2023 Dec 1;31(23):1173-1179. doi: 10.5435/JAAOS-D-23-00396. Epub 2023 Sep 4.
6
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
7
Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.使用心身医学考试问题评估 ChatGPT 对布鲁姆教育目标分类法的掌握程度:混合方法研究。
J Med Internet Res. 2024 Jan 23;26:e52113. doi: 10.2196/52113.
8
Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists.GPT 各代产品在专为认证医师为认证临床骨密度技师而设计的考试中的表现。
J Clin Densitom. 2024 Apr-Jun;27(2):101480. doi: 10.1016/j.jocd.2024.101480. Epub 2024 Feb 17.
9
Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.Gemini Advanced与ChatGPT 4.0在眼科住院医师眼科知识评估计划(OKAP)考试复习题库中的表现比较。
Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.
10
Performance of GPT-4 with Vision on Text- and Image-based ACR Diagnostic Radiology In-Training Examination Questions.GPT-4 在基于文本和图像的放射科住院医师诊断考试中的表现。
Radiology. 2024 Sep;312(3):e240153. doi: 10.1148/radiol.240153.

引用本文的文献

1
Large language models in medical education: a comparative cross-platform evaluation in answering histological questions.医学教育中的大语言模型:回答组织学问题的比较性跨平台评估
Med Educ Online. 2025 Dec;30(1):2534065. doi: 10.1080/10872981.2025.2534065. Epub 2025 Jul 12.
2
Comparison of hand surgery certification exams in Europe and the United States using ChatGPT 4.0.使用ChatGPT 4.0对欧美手部外科认证考试进行比较。
J Hand Microsurg. 2025 May 5;17(4):100258. doi: 10.1016/j.jham.2025.100258. eCollection 2025 Jul.

本文引用的文献

1
ChatGPT's Performance on the Hand Surgery Self-Assessment Exam: A Critical Analysis.ChatGPT在手外科自我评估考试中的表现:一项批判性分析。
J Hand Surg Glob Online. 2024 Jan 2;6(2):200-205. doi: 10.1016/j.jhsg.2023.11.014. eCollection 2024 Mar.
2
From text to image: challenges in integrating vision into ChatGPT for medical image interpretation.从文本到图像:将视觉融入ChatGPT进行医学图像解读面临的挑战。
Neural Regen Res. 2025 Feb 1;20(2):487-488. doi: 10.4103/NRR.NRR-D-24-00165. Epub 2024 Apr 3.
3
The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination.ChatGPT在美国手外科协会自我评估考试中的表现。
Cureus. 2024 Apr 24;16(4):e58950. doi: 10.7759/cureus.58950. eCollection 2024 Apr.
4
Unveiling the clinical incapabilities: a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis.揭示临床能力不足:GPT-4V(ision) 眼科多模态图像分析的基准研究。
Br J Ophthalmol. 2024 Sep 20;108(10):1384-1389. doi: 10.1136/bjo-2023-325054.
5
Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features.评估ChatGPT在骨科训练考试中的能力:对新图像处理功能的分析
Cureus. 2024 Mar 11;16(3):e55945. doi: 10.7759/cureus.55945. eCollection 2024 Mar.
6
Diagnostic Performance of Artificial Intelligence for Detection of Scaphoid and Distal Radius Fractures: A Systematic Review.人工智能检测舟状骨和桡骨远端骨折的诊断性能:系统评价。
J Hand Surg Am. 2024 May;49(5):411-422. doi: 10.1016/j.jhsa.2024.01.020. Epub 2024 Mar 28.
7
Experimenting With the New Frontier: Artificial Intelligence-Powered Chat Bots in Hand Surgery.探索新领域:手外科中的人工智能驱动聊天机器人
Hand (N Y). 2024 Mar 25:15589447241238372. doi: 10.1177/15589447241238372.
8
How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information.ChatGPT 与谷歌相比如何使用来源信息?在线健康信息的文本网络分析。
Clin Orthop Relat Res. 2024 Apr 1;482(4):578-588. doi: 10.1097/CORR.0000000000002995. Epub 2024 Mar 1.
9
Can ChatGPT vision diagnose melanoma? An exploratory diagnostic accuracy study.ChatGPT视觉能否诊断黑色素瘤?一项探索性诊断准确性研究。
J Am Acad Dermatol. 2024 May;90(5):1057-1059. doi: 10.1016/j.jaad.2023.12.062. Epub 2024 Jan 19.
10
Evaluating performance of custom GPT in anesthesia practice.评估定制GPT在麻醉实践中的表现。
J Clin Anesth. 2024 May;93:111371. doi: 10.1016/j.jclinane.2023.111371. Epub 2023 Dec 28.

定制生成式预训练变换器在美国手外科协会自我评估考试中的表现

The Performance of a Customized Generative Pre-trained Transformer on the American Society for Surgery of the Hand Self-Assessment Examination.

作者信息

Flynn Jason C, Zeitlin Jacob, Arango Sebastian D, Pineda Nathaniel, Miller Andrew J, Weir Tristan B

机构信息

Department of Orthopaedic Surgery, Philadelphia Hand to Shoulder Center, Philadelphia, USA.

Department of Orthopaedics, Philadelphia Hand to Shoulder Center, Philadelphia, USA.

出版信息

Cureus. 2024 Sep 25;16(9):e70205. doi: 10.7759/cureus.70205. eCollection 2024 Sep.

DOI:10.7759/cureus.70205
PMID:39463620
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11510647/
Abstract

INTRODUCTION

Multimodal large language models (MLLMs), such as OpenAI's ChatGPT (San Francisco, CA), have the potential to improve medical care delivery and education, although important shortcomings in accuracy and image interpretation have been noted. The aim of this study was to assess the multimodal performance of a ChatGPT model customized with hand surgery-specific knowledge.

METHODS

A customized generative pre-trained transformer (GPT) was trained using peer-reviewed literature recommended by the American Society for Surgery of the Hand (ASSH). Questions were taken from the ASSH 2022 Self-Assessment Examination (SAE). GPT-4 and the customized GPT were asked text-based multiple-choice questions. The customized GPT was also asked image-containing questions, both with and without access to the image(s) associated with each question.

RESULTS

A total of 192 questions were included. The customized GPT responded to the 119 text-only questions with greater accuracy than GPT-4 (107 (89.9%) versus 91 (76.5%), P = 0.001). Human examinees answered 87.3% (IQR: 71.6-93.7%) of the same text-based questions correctly. Of the 73 questions with images, the customized GPT answered 55 (75.3%) questions correctly, which dropped to 51 (69.9%) when the images were withheld (P = 0.317). The human examinees answered 87.2% (IQR: 79.4-95.4%) of image-based questions correctly.

CONCLUSION

Our findings suggest significant improvements in ChatGPT's ability to answer text-based hand surgery questions with hand-specific training. ChatGPT is still limited in its ability to interpret images to answer questions related to hand conditions. These data show hand surgeons can create customized GPT models to provide tailored answers to specific questions, which may serve as the foundation for educational and clinical tools.

摘要

引言

多模态大语言模型(MLLM),如OpenAI的ChatGPT(加利福尼亚州旧金山),有潜力改善医疗服务和教育,尽管已经注意到其在准确性和图像解释方面存在重要缺陷。本研究的目的是评估一个用手外科特定知识定制的ChatGPT模型的多模态性能。

方法

使用美国手外科协会(ASSH)推荐的同行评审文献训练一个定制的生成式预训练变换器(GPT)。问题取自ASSH 2022年自我评估考试(SAE)。向GPT-4和定制的GPT提出基于文本的多项选择题。还向定制的GPT提出包含图像的问题,分别在可访问和不可访问与每个问题相关的图像的情况下进行提问。

结果

共纳入192个问题。定制的GPT对119个纯文本问题的回答准确率高于GPT-4(107个(89.9%)对91个(76.5%),P = 0.001)。人类考生对相同的基于文本的问题回答正确率为87.3%(四分位距:71.6 - 93.7%)。在73个有图像的问题中,定制的GPT正确回答了55个(75.3%)问题,当不提供图像时降至51个(69.9%)(P = 0.317)。人类考生对基于图像的问题回答正确率为87.2%(四分位距:79.4 - 95.4%)。

结论

我们的研究结果表明,通过手部特定训练,ChatGPT回答基于文本的手外科问题的能力有显著提高。ChatGPT在解释图像以回答与手部情况相关问题的能力方面仍然有限。这些数据表明,手外科医生可以创建定制的GPT模型,为特定问题提供量身定制的答案,这可能成为教育和临床工具的基础。