Suppr超能文献

人工智能增强:GPT-4和GPT-3.5在整形外科在职考试中的表现。

Artificial Intelligence Augmentation: Performance of GPT-4 and GPT-3.5 on the Plastic Surgery In-service Examination.

作者信息

Najafali Daniel, Reiche Erik, Araya Sthefano, Orellana Manuel, Liu Farrah C, Camacho Justin M, Patel Sameer A, Broyles Justin M, Dorafshar Amir H, Morrison Shane D, Knoedler Leonard, Fox Paige M

机构信息

From the Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL.

Division of Plastic and Reconstructive Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA.

出版信息

Plast Reconstr Surg Glob Open. 2025 Apr 10;13(4):e6645. doi: 10.1097/GOX.0000000000006645. eCollection 2025 Apr.

Abstract

BACKGROUND

ChatGPT-3.5 scored in the 52nd percentile of the Plastic Surgery In-service Examination, making its knowledge equivalent to a first-year integrated resident. The updated GPT-4 may have improved performance given its more expansive training set. We hypothesized that GPT-4 would outperform its predecessor, making it a more valuable potential asset to surgical education.

METHODS

Questions from the 2022 Plastic Surgery In-service Examination were given to GPT-4 and GPT-3.5. Both were prompted using 3 different structures. The 2022 American Society of Plastic Surgeons Norm Tables were used to compare the performance of the chatbot to national metrics from plastic surgery residents.

RESULTS

GPT-4 answered a total of 237 questions with an overall accuracy of 63% across all 3 strategies. The accuracy was as follows for the prompting schemes: 54% for open ended, 67% for multiple choice (MC), and 68% for MC with explanation. The section with the highest accuracy (74%) among all strategies was Section 4: Breast and Cosmetic. GPT-4's highest scoring methodology (MC with explanation, 68%) placed it in the following national integrated percentiles: 93rd percentile for the first year, 76th percentile for the second year, 52nd percentile for the third year, 34th percentile for the fourth year, 17th percentile for the fifth year, and 15th percentile for the sixth year. GPT-3.5 scored 58% overall.

CONCLUSIONS

GPT-4 outperformed its predecessor but only scored in the 15th percentile compared with postgraduate year-6 residents. More refinement is needed to achieve performance metrics equivalent to an attending plastic surgeon and become a valuable tool for surgical education.

摘要

背景

ChatGPT-3.5在整形外科在职考试中得分处于第52百分位,其知识水平相当于一年级综合住院医师。鉴于GPT-4的训练集更广泛,更新后的GPT-4可能表现有所提升。我们假设GPT-4将超越其前身,使其成为外科教育中更有价值的潜在资产。

方法

将2022年整形外科在职考试的问题提供给GPT-4和GPT-3.5。两者均使用3种不同结构进行提问。使用2022年美国整形外科医师协会标准表将聊天机器人的表现与整形外科住院医师的全国指标进行比较。

结果

GPT-4总共回答了237个问题,在所有3种策略中的总体准确率为63%。提示方案的准确率如下:开放式为54%,多项选择题(MC)为67%,带解释的MC为68%。所有策略中准确率最高的部分(74%)是第4部分:乳房与美容。GPT-4得分最高的方法(带解释的MC,68%)使其在全国综合百分位中处于以下位置:第一年为第93百分位,第二年为第76百分位,第三年为第52百分位,第四年为第34百分位,第五年为第17百分位,第六年为第15百分位。GPT-3.5的总体得分为58%。

结论

GPT-4的表现优于其前身,但与六年级住院医师相比仅处于第15百分位。需要进一步改进才能达到与整形外科主治医师相当的表现指标,并成为外科教育的有价值工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1317/11984779/4e70565c3b3d/gox-13-e6645-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验