人工智能增强：GPT-4和GPT-3.5在整形外科在职考试中的表现。

Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination.

作者信息

Najafali Daniel, Reiche Erik, Araya Sthefano, Orellana Manuel, Liu Farrah C, Camacho Justin M, Patel Sameer A, Broyles Justin M, Dorafshar Amir H, Morrison Shane D, Knoedler Leonard, Fox Paige M

机构信息

From the Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL.

Division of Plastic and Reconstructive Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.

出版信息

Plast Reconstr Surg Glob Open. 2025 Apr 10;13(4):e6645. doi: 10.1097/GOX.0000000000006645. eCollection 2025 Apr.

DOI:10.1097/GOX.0000000000006645

PMID:40212094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11984779/

Abstract

BACKGROUND

ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education.

METHODS

Questions from the 2022 Plastic Surgery In-service Examination were given to GPT-4 and GPT-3.5. Both were prompted using 3 different structures. The 2022 American Society of Plastic Surgeons Norm Tables were used to compare the performance of the chatbot to national metrics from plastic surgery residents.

RESULTS

GPT-4 answered a total of 237 questions with an overall accuracy of 63% across all 3 strategies. The accuracy was as follows for the prompting schemes: 54% for open ended, 67% for multiple choice (MC), and 68% for MC with explanation. The section with the highest accuracy (74%) among all strategies was Section 4: Breast and Cosmetic. GPT-4's highest scoring methodology (MC with explanation, 68%) placed it in the following national integrated percentiles: 93rd percentile for the first year, 76th percentile for the second year, 52nd percentile for the third year, 34th percentile for the fourth year, 17th percentile for the fifth year, and 15th percentile for the sixth year. GPT-3.5 scored 58% overall.

CONCLUSIONS

GPT-4 outperformed its predecessor but only scored in the 15th percentile compared with postgraduate year-6 residents. More refinement is needed to achieve performance metrics equivalent to an attending plastic surgeon and become a valuable tool for surgical education.

摘要

背景

ChatGPT-3.5在整形外科在职考试中得分处于第52百分位，其知识水平相当于一年级综合住院医师。鉴于GPT-4的训练集更广泛，更新后的GPT-4可能表现有所提升。我们假设GPT-4将超越其前身，使其成为外科教育中更有价值的潜在资产。

方法

将2022年整形外科在职考试的问题提供给GPT-4和GPT-3.5。两者均使用3种不同结构进行提问。使用2022年美国整形外科医师协会标准表将聊天机器人的表现与整形外科住院医师的全国指标进行比较。

结果

GPT-4总共回答了237个问题，在所有3种策略中的总体准确率为63%。提示方案的准确率如下：开放式为54%，多项选择题（MC）为67%，带解释的MC为68%。所有策略中准确率最高的部分（74%）是第4部分：乳房与美容。GPT-4得分最高的方法（带解释的MC，68%）使其在全国综合百分位中处于以下位置：第一年为第93百分位，第二年为第76百分位，第三年为第52百分位，第四年为第34百分位，第五年为第17百分位，第六年为第15百分位。GPT-3.5的总体得分为58%。

结论

GPT-4的表现优于其前身，但与六年级住院医师相比仅处于第15百分位。需要进一步改进才能达到与整形外科主治医师相当的表现指标，并成为外科教育的有价值工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1317/11984779/4e70565c3b3d/gox-13-e6645-g001.jpg

相似文献

Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination.人工智能增强：GPT-4和GPT-3.5在整形外科在职考试中的表现。

Plast Reconstr Surg Glob Open. 2025 Apr 10;13(4):e6645. doi: 10.1097/GOX.0000000000006645. eCollection 2025 Apr.

ChatGPT Is Equivalent to First-Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Examination.ChatGPT 相当于第一年整形外科住院医师：ChatGPT 在整形外科住院医师年度考核中的评估。

Aesthet Surg J. 2023 Nov 16;43(12):NP1085-NP1089. doi: 10.1093/asj/sjad130.

ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5.ChatGPT-4超越住院医师：一项关于整形外科在职考试中人工智能能力及其相对于ChatGPT-3.5进展的研究。

Plast Reconstr Surg Glob Open. 2024 Sep 5;12(9):e6136. doi: 10.1097/GOX.0000000000006136. eCollection 2024 Sep.

Bard Versus the 2022 American Society of Plastic Surgeons In-Service Examination: Performance on the Examination in Its Intern Year.巴德与2022年美国整形外科医师学会在职考试：实习年度考试表现

Aesthet Surg J Open Forum. 2023 Jul 19;6:ojad066. doi: 10.1093/asjof/ojad066. eCollection 2024.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Applying GPT-4 to the Plastic Surgery Inservice Training Examination.将 GPT-4 应用于整形外科住院医师培训考试。

J Plast Reconstr Aesthet Surg. 2023 Dec;87:78-82. doi: 10.1016/j.bjps.2023.09.027. Epub 2023 Sep 14.

Advancements in AI Medical Education: Assessing ChatGPT's Performance on USMLE-Style Questions Across Topics and Difficulty Levels.人工智能医学教育的进展：评估ChatGPT在不同主题和难度级别的美国医师执照考试（USMLE）风格问题上的表现。

Cureus. 2024 Dec 24;16(12):e76309. doi: 10.7759/cureus.76309. eCollection 2024 Dec.

Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study.ChatGPT-4与日本内科住院医师在普通内科培训考试中的表现比较：比较研究

JMIR Med Educ. 2023 Dec 6;9:e52202. doi: 10.2196/52202.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗？骨科住院医师与ChatGPT的对比。

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

本文引用的文献

Turn Your Vision into Reality-AI-Powered Pre-operative Outcome Simulation in Rhinoplasty Surgery.将您的愿景变为现实——鼻整形手术中基于人工智能的术前结果模拟

Aesthetic Plast Surg. 2024 Dec;48(23):4833-4838. doi: 10.1007/s00266-024-04043-9. Epub 2024 May 22.

Exploring the application of CHATGPT in plastic surgery: a comprehensive systematic review.探讨 CHATGPT 在整形外科学中的应用：一项全面的系统综述。

J Pak Med Assoc. 2024 Apr;74(4 (Supple-4)):S17-S28. doi: 10.47391/JPMA.AKU-9S-04.

Letter to the Editor: The Promise and Pitfalls of AI-Generated Anatomical Images-Evaluating Midjourney for Aesthetic Surgery Applications.致编辑的信：人工智能生成的解剖图像的前景与陷阱——评估Midjourney在美容手术中的应用

Aesthetic Plast Surg. 2025 Apr;49(7):2130-2131. doi: 10.1007/s00266-024-04076-0. Epub 2024 May 3.

Addressing the Rhino in the Room: ChatGPT Creates "Novel" Patent Ideas for Rhinoplasty.直面房间里的犀牛：ChatGPT为隆鼻术创造“新颖”的专利想法。

Eplasty. 2024 Mar 12;24:e13. eCollection 2024.

Future of artificial intelligence in plastic surgery: Toward the development of specialty-specific large language models.整形外科中人工智能的未来：迈向特定专业大语言模型的发展

J Plast Reconstr Aesthet Surg. 2024 Jun;93:70-71. doi: 10.1016/j.bjps.2024.04.054. Epub 2024 Apr 23.

Large language models are able to downplay their cognitive abilities to fit the persona they simulate.大型语言模型能够淡化其认知能力，以适应其模拟的角色。

PLoS One. 2024 Mar 13;19(3):e0298522. doi: 10.1371/journal.pone.0298522. eCollection 2024.

The Promise and Pitfalls of AI-Generated Anatomical Images: Evaluating Midjourney for Aesthetic Surgery Applications.人工智能生成解剖图像的前景与陷阱：评估 Midjourney 在美容外科中的应用。

Aesthetic Plast Surg. 2024 May;48(9):1874-1883. doi: 10.1007/s00266-023-03826-w. Epub 2024 Jan 18.

Aesthet Surg J Open Forum. 2023 Jul 19;6:ojad066. doi: 10.1093/asjof/ojad066. eCollection 2024.

Leveraging Large Language Models (LLM) for the Plastic Surgery Resident Training: Do They Have a Role?利用大语言模型进行整形外科住院医师培训：它们能发挥作用吗？

Indian J Plast Surg. 2023 Aug 28;56(5):413-420. doi: 10.1055/s-0043-1772704. eCollection 2023 Oct.

Utilization of ChatGPT-4 in Plastic and Reconstructive Surgery: A Narrative Review.ChatGPT-4在整形与重建外科中的应用：一篇叙述性综述。

Plast Reconstr Surg Glob Open. 2023 Oct 26;11(10):e5305. doi: 10.1097/GOX.0000000000005305. eCollection 2023 Oct.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能增强：GPT-4和GPT-3.5在整形外科在职考试中的表现。

Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献