将 GPT-4 应用于整形外科住院医师培训考试。

Applying GPT-4 to the Plastic Surgery Inservice Training Examination.

机构信息

Division of Plastic Surgery, Department of Surgery, St. Louis University School of Medicine, St. Louis, MO, USA.

Department of Plastic Surgery, Rutgers New Jersey School of Medicine, Newark, NJ, USA.

出版信息

J Plast Reconstr Aesthet Surg. 2023 Dec;87:78-82. doi: 10.1016/j.bjps.2023.09.027. Epub 2023 Sep 14.

DOI:10.1016/j.bjps.2023.09.027

PMID:37812847

Abstract

BACKGROUND

The recent introduction of Generative Pre-trained Transformer (GPT)-4 has demonstrated the potential to be a superior version of ChatGPT-3.5. According to many, GPT-4 is seen as a more reliable and creative version of GPT-3.5.

OBJECTIVE

In conjugation with our prior manuscript, we wanted to determine if GPT-4 could be exploited as an instrument for plastic surgery graduate medical education by evaluating its performance on the Plastic Surgery Inservice Training Examination (PSITE).

METHODS

Sample assessment questions from the 2022 PSITE were obtained from the American Council of Academic Plastic Surgeons website and manually inputted into GPT-4. Responses by GPT-4 were qualified using the properties of natural coherence. Incorrect answers were stratified into the consequent categories: informational, logical, or explicit fallacy.

RESULTS

From a total of 242 questions, GPT-4 provided correct answers for 187, resulting in a 77.3% accuracy rate. Logical reasoning was utilized in 95.0% of questions, internal information in 98.3%, and external information in 97.5%. Upon separating the questions based on incorrect and correct responses, a statistically significant difference was identified in GPT-4's application of logical reasoning.

CONCLUSION

GPT-4 has shown to be more accurate and reliable for plastic surgery resident education when compared to GPT-3.5. Users should look to utilize the tool to enhance their educational curriculum. Those who adopt the use of such models may be better equipped to deliver high-quality care to their patients.

摘要

背景

最近引入的生成式预训练转换器（GPT）-4 展示了成为 ChatGPT-3.5 的优越版本的潜力。根据许多人的说法，GPT-4 被认为是比 GPT-3.5 更可靠和更有创意的版本。

目的

结合我们之前的论文，我们想确定 GPT-4 是否可以通过评估其在整形外科住院医师医学教育中的表现来用作整形外科住院医师医学教育的工具，即通过评估其在整形外科在职培训考试（PSITE）中的表现来确定。

方法

从美国学术整形外科医师协会网站获得 2022 年 PSITE 的样本评估问题，并手动输入 GPT-4。使用自然连贯性的特性来对 GPT-4 的回答进行定性。将错误答案分为以下几类：信息性、逻辑性或明显错误。

结果

在总共 242 个问题中，GPT-4 正确回答了 187 个问题，准确率为 77.3%。95.0%的问题使用了逻辑推理，98.3%的问题使用了内部信息，97.5%的问题使用了外部信息。根据错误和正确回答对问题进行分类后，发现 GPT-4 在逻辑推理的应用方面存在统计学上的显著差异。

结论

与 GPT-3.5 相比，GPT-4 对整形外科住院医师教育更准确、更可靠。用户应该考虑利用该工具来增强他们的教育课程。那些采用此类模型的人可能更有能力为他们的患者提供高质量的护理。

相似文献

Applying GPT-4 to the Plastic Surgery Inservice Training Examination.将 GPT-4 应用于整形外科住院医师培训考试。

J Plast Reconstr Aesthet Surg. 2023 Dec;87:78-82. doi: 10.1016/j.bjps.2023.09.027. Epub 2023 Sep 14.

Performance of ChatGPT on the Plastic Surgery Inservice Training Examination.ChatGPT 在整形外科学在职培训考试中的表现。

Aesthet Surg J. 2023 Nov 16;43(12):NP1078-NP1082. doi: 10.1093/asj/sjad128.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较：评估研究。

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.ChatGPT在秘鲁国家医学执照考试中的表现：横断面研究

JMIR Med Educ. 2023 Sep 28;9:e48039. doi: 10.2196/48039.

Class in Session: Analysis of GPT-4-created Plastic Surgery In-service Examination Questions.课堂进行中：对GPT-4生成的整形外科在职考试问题的分析

Plast Reconstr Surg Glob Open. 2024 Sep 19;12(9):e6185. doi: 10.1097/GOX.0000000000006185. eCollection 2024 Sep.

Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study.多伦多大学家庭医学住院医师进展测试中住院医师与人工智能聊天机器人表现的评估：比较研究

JMIR Med Educ. 2023 Sep 19;9:e50514. doi: 10.2196/50514.

Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination.生成式预训练转换器（GPT-4）在家庭医学住院医师考试中的性能评估。

J Am Board Fam Med. 2024 Oct 25;37(4):528-582. doi: 10.3122/jabfm.2023.230433R1.

Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination.ChatGPT作为骨科住院医师培训考试备考工具的效用。

J Exp Orthop. 2025 Jan 2;12(1):e70135. doi: 10.1002/jeo2.70135. eCollection 2025 Jan.

引用本文的文献

Performance of ChatGPT on the Plastic Surgery In-Training Examination.ChatGPT在整形外科住院医师培训考试中的表现。

Eplasty. 2024 Dec 18;24:e68. eCollection 2024.

Modern artificial intelligence and large language models in graduate medical education: a scoping review of attitudes, applications & practice.研究生医学教育中的现代人工智能与大语言模型：态度、应用及实践的范围综述

BMC Med Educ. 2025 May 20;25(1):730. doi: 10.1186/s12909-025-07321-5.

ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.医学教育中的ChatGPT及其他大语言模型——文献综述

Med Sci Educ. 2024 Nov 13;35(1):555-567. doi: 10.1007/s40670-024-02206-6. eCollection 2025 Feb.

Exploring medical students' intention to use of ChatGPT from a programming course: a grounded theory study in China.从一门编程课程探究医学生使用ChatGPT的意愿：一项在中国的扎根理论研究

BMC Med Educ. 2025 Feb 8;25(1):209. doi: 10.1186/s12909-025-06807-6.

Performance of Artificial Intelligence Chatbots in Answering Clinical Questions on Japanese Practical Guidelines for Implant-based Breast Reconstruction.人工智能聊天机器人在回答基于日本乳房植入重建实用指南的临床问题中的表现

Aesthetic Plast Surg. 2025 Apr;49(7):1947-1953. doi: 10.1007/s00266-024-04515-y. Epub 2024 Nov 26.

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment.新型GPT-4在耳鼻喉科知识评估中的表现

Indian J Otolaryngol Head Neck Surg. 2024 Dec;76(6):6112-6114. doi: 10.1007/s12070-024-04935-x. Epub 2024 Aug 3.

The Potential of Chat-Based Artificial Intelligence Models in Differentiating Between Keloid and Hypertrophic Scars: A Pilot Study.基于聊天的人工智能模型在区分瘢痕疙瘩和增生性瘢痕方面的潜力：一项初步研究。

Aesthetic Plast Surg. 2024 Dec;48(24):5367-5372. doi: 10.1007/s00266-024-04380-9. Epub 2024 Sep 25.

Breaking Boundaries in Spinal Surgery: GPT-4's Quest to Revolutionize Surgical Site Infection Management.脊柱外科的突破：GPT-4革新手术部位感染管理的探索。

J Infect Dis. 2025 Feb 20;231(2):e345-e354. doi: 10.1093/infdis/jiae403.

Artificial Intelligence as a Triage Tool during the Perioperative Period: Pilot Study of Accuracy and Accessibility for Clinical Application.人工智能作为围手术期的分诊工具：临床应用准确性和可及性的初步研究

Plast Reconstr Surg Glob Open. 2024 Feb 2;12(2):e5580. doi: 10.1097/GOX.0000000000005580. eCollection 2024 Feb.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将 GPT-4 应用于整形外科住院医师培训考试。

Applying GPT-4 to the Plastic Surgery Inservice Training Examination.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献