ChatGPT-4超越住院医师：一项关于整形外科在职考试中人工智能能力及其相对于ChatGPT-3.5进展的研究。

ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5.

作者信息

Hubany Shannon S, Scala Fernanda D, Hashemi Kiana, Kapoor Saumya, Fedorova Julia R, Vaccaro Matthew J, Ridout Rees P, Hedman Casey C, Kellogg Brian C, Leto Barone Angelo A

机构信息

From the University of Central Florida College of Medicine, Orlando, Fla.

Division of Craniofacial and Pediatric Plastic Surgery, Nemours Children's Hospital, Orlando, Fla.

出版信息

Plast Reconstr Surg Glob Open. 2024 Sep 5;12(9):e6136. doi: 10.1097/GOX.0000000000006136. eCollection 2024 Sep.

DOI:10.1097/GOX.0000000000006136

PMID:39239234

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11377087/

Abstract

BACKGROUND

ChatGPT, launched in 2022 and updated to Generative Pre-trained Transformer 4 (GPT-4) in 2023, is a large language model trained on extensive data, including medical information. This study compares ChatGPT's performance on Plastic Surgery In-Service Examinations with medical residents nationally as well as its earlier version, ChatGPT-3.5.

METHODS

This study reviewed 1500 questions from the Plastic Surgery In-service Examinations from 2018 to 2023. After excluding image-based, unscored, and inconclusive questions, 1292 were analyzed. The question stem and each multiple-choice answer was inputted verbatim into ChatGPT-4.

RESULTS

ChatGPT-4 correctly answered 961 (74.4%) of the included questions. Best performance by section was in core surgical principles (79.1% correct) and lowest in craniomaxillofacial (69.1%). ChatGPT-4 ranked between the 61st and 97th percentiles compared with all residents. Comparatively, ChatGPT-4 significantly outperformed ChatGPT-3.5 in 2018-2022 examinations ( < 0.001). Although ChatGPT-3.5 averaged 55.5% correctness, ChatGPT-4 averaged 74%, a mean difference of 18.54%. In 2021, ChatGPT-3.5 ranked in the 23rd percentile of all residents, whereas ChatGPT-4 ranked in the 97th percentile. ChatGPT-4 outperformed 80.7% of residents on average and scored above the 97th percentile among first-year residents. Its performance was comparable with sixth-year integrated residents, ranking in the 55.7th percentile, on average. These results show significant improvements in ChatGPT-4's application of medical knowledge within six months of ChatGPT-3.5's release.

CONCLUSION

This study reveals ChatGPT-4's rapid developments, advancing from a first-year medical resident's level to surpassing independent residents and matching a sixth-year resident's proficiency.

摘要

背景

ChatGPT于2022年推出，并于2023年更新为生成式预训练变换器4（GPT-4），是一种基于包括医学信息在内的大量数据训练的大型语言模型。本研究将ChatGPT在整形外科在职考试中的表现与全国医学住院医师进行比较，并与其早期版本ChatGPT-3.5进行比较。

方法

本研究回顾了2018年至2023年整形外科在职考试中的1500道题目。在排除基于图像、未计分和无定论的题目后，对1292道题目进行了分析。将题干和每个多项选择题答案逐字输入ChatGPT-4。

结果

ChatGPT-4正确回答了961道（74.4%）纳入的题目。各部分表现最佳的是核心外科原则（正确率79.1%），最差的是颅颌面外科（69.1%）。与所有住院医师相比，ChatGPT-4的百分位排名在第61至97之间。相比之下，在2018 - 2022年考试中，ChatGPT-4的表现显著优于ChatGPT-3.5（<0.001）。虽然ChatGPT-3.5的平均正确率为55.5%，但ChatGPT-4的平均正确率为74%，平均差异为18.54%。2021年，ChatGPT-3.5在所有住院医师中排名第23百分位，而ChatGPT-4排名第97百分位。ChatGPT-4平均超过80.7%的住院医师，在一年级住院医师中得分高于第97百分位。其表现与六年级综合住院医师相当，平均排名第55.7百分位。这些结果表明，在ChatGPT-3.5发布后的六个月内，ChatGPT-4在医学知识应用方面有了显著改进。

结论

本研究揭示了ChatGPT-4的快速发展，从一年级医学住院医师的水平提升到超过独立住院医师，并与六年级住院医师的熟练程度相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e95/11377087/d20932057160/gox-12-e6136-g001.jpg

相似文献

ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5.ChatGPT-4超越住院医师：一项关于整形外科在职考试中人工智能能力及其相对于ChatGPT-3.5进展的研究。

Plast Reconstr Surg Glob Open. 2024 Sep 5;12(9):e6136. doi: 10.1097/GOX.0000000000006136. eCollection 2024 Sep.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

ChatGPT Is Equivalent to First-Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Examination.ChatGPT 相当于第一年整形外科住院医师：ChatGPT 在整形外科住院医师年度考核中的评估。

Aesthet Surg J. 2023 Nov 16;43(12):NP1085-NP1089. doi: 10.1093/asj/sjad130.

Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.GPT-3.5 和 GPT-4 与医学生在书面德语文凭考试中的表现比较：观察性研究。

JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗？骨科住院医师与ChatGPT的对比。

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions.人工智能 ChatGPT 在肿瘤学检查问题中的准确性。

J Am Coll Radiol. 2024 Nov;21(11):1800-1804. doi: 10.1016/j.jacr.2024.07.011. Epub 2024 Aug 2.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试（USMLE）中的表现如何？大语言模型对医学教育和知识评估的影响。

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT.ChatGPT 3.5和4在美国牙科考试中的表现：国际牙科执照考试（INBDE）、高级牙科能力倾向测试（ADAT）和牙科入学考试（DAT）

Imaging Sci Dent. 2024 Sep;54(3):271-275. doi: 10.5624/isd.20240037. Epub 2024 Jul 2.

引用本文的文献

ChatGPT-4o is Not a Reliable Study Source for Orthopaedic Surgery Residents.ChatGPT-4o并非骨科住院医师可靠的学习资源。

JB JS Open Access. 2025 Sep 11;10(3). doi: 10.2106/JBJS.OA.25.00112. eCollection 2025 Jul-Sep.

Artificial Intelligence and Plastic Surgery Resident Education.人工智能与整形外科住院医师教育

Plast Reconstr Surg Glob Open. 2025 Jul 17;13(7):e6924. doi: 10.1097/GOX.0000000000006924. eCollection 2025 Jul.

Artificial intelligence in clinical practice: a cross-sectional survey of paediatric surgery residents' perspectives.临床实践中的人工智能：一项关于小儿外科住院医师观点的横断面调查。

BMJ Health Care Inform. 2025 May 21;32(1):e101456. doi: 10.1136/bmjhci-2025-101456.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性：系统评价与网络荟萃分析

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

本文引用的文献

ChatGPT's Performance on the Hand Surgery Self-Assessment Exam: A Critical Analysis.ChatGPT在手外科自我评估考试中的表现：一项批判性分析。

J Hand Surg Glob Online. 2024 Jan 2;6(2):200-205. doi: 10.1016/j.jhsg.2023.11.014. eCollection 2024 Mar.

Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field.评估 ChatGPT 在骨科在职培训考试中的表现和在该领域的适用性。

J Orthop Surg Res. 2024 Jan 3;19(1):27. doi: 10.1186/s13018-023-04467-0.

Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training.利用 ChatGPT 和 GPT-4 评估西班牙专科医学培训准入考试的风湿病学问题。

Sci Rep. 2023 Dec 13;13(1):22129. doi: 10.1038/s41598-023-49483-6.

The Future of AI and Informatics in Radiology: 10 Predictions.放射学中人工智能与信息学的未来：十大预测。

Radiology. 2023 Oct;309(1):e231114. doi: 10.1148/radiol.231114.

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions.ChatGPT 在不同耳鼻喉科亚专业中的测验技能：对 2576 道选择题和多选题进行 board certification 准备的分析。

Eur Arch Otorhinolaryngol. 2023 Sep;280(9):4271-4278. doi: 10.1007/s00405-023-08051-4. Epub 2023 Jun 7.

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.ChatGPT 和大型语言模型的兴起：公共卫生领域新的 AI 驱动的信息疫情威胁。

Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.

Aesthet Surg J. 2023 Nov 16;43(12):NP1085-NP1089. doi: 10.1093/asj/sjad130.

Performance of ChatGPT on the Plastic Surgery Inservice Training Examination.ChatGPT 在整形外科学在职培训考试中的表现。

Aesthet Surg J. 2023 Nov 16;43(12):NP1078-NP1082. doi: 10.1093/asj/sjad128.

The rise of ChatGPT: Exploring its potential in medical education.ChatGPT 的兴起：探索其在医学教育中的潜力。

Anat Sci Educ. 2024 Jul-Aug;17(5):926-931. doi: 10.1002/ase.2270. Epub 2023 Mar 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT-4超越住院医师：一项关于整形外科在职考试中人工智能能力及其相对于ChatGPT-3.5进展的研究。

ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献