Suppr超能文献

ChatGPT-4超越住院医师:一项关于整形外科在职考试中人工智能能力及其相对于ChatGPT-3.5进展的研究。

ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5.

作者信息

Hubany Shannon S, Scala Fernanda D, Hashemi Kiana, Kapoor Saumya, Fedorova Julia R, Vaccaro Matthew J, Ridout Rees P, Hedman Casey C, Kellogg Brian C, Leto Barone Angelo A

机构信息

From the University of Central Florida College of Medicine, Orlando, Fla.

Division of Craniofacial and Pediatric Plastic Surgery, Nemours Children's Hospital, Orlando, Fla.

出版信息

Plast Reconstr Surg Glob Open. 2024 Sep 5;12(9):e6136. doi: 10.1097/GOX.0000000000006136. eCollection 2024 Sep.

Abstract

BACKGROUND

ChatGPT, launched in 2022 and updated to Generative Pre-trained Transformer 4 (GPT-4) in 2023, is a large language model trained on extensive data, including medical information. This study compares ChatGPT's performance on Plastic Surgery In-Service Examinations with medical residents nationally as well as its earlier version, ChatGPT-3.5.

METHODS

This study reviewed 1500 questions from the Plastic Surgery In-service Examinations from 2018 to 2023. After excluding image-based, unscored, and inconclusive questions, 1292 were analyzed. The question stem and each multiple-choice answer was inputted verbatim into ChatGPT-4.

RESULTS

ChatGPT-4 correctly answered 961 (74.4%) of the included questions. Best performance by section was in core surgical principles (79.1% correct) and lowest in craniomaxillofacial (69.1%). ChatGPT-4 ranked between the 61st and 97th percentiles compared with all residents. Comparatively, ChatGPT-4 significantly outperformed ChatGPT-3.5 in 2018-2022 examinations ( < 0.001). Although ChatGPT-3.5 averaged 55.5% correctness, ChatGPT-4 averaged 74%, a mean difference of 18.54%. In 2021, ChatGPT-3.5 ranked in the 23rd percentile of all residents, whereas ChatGPT-4 ranked in the 97th percentile. ChatGPT-4 outperformed 80.7% of residents on average and scored above the 97th percentile among first-year residents. Its performance was comparable with sixth-year integrated residents, ranking in the 55.7th percentile, on average. These results show significant improvements in ChatGPT-4's application of medical knowledge within six months of ChatGPT-3.5's release.

CONCLUSION

This study reveals ChatGPT-4's rapid developments, advancing from a first-year medical resident's level to surpassing independent residents and matching a sixth-year resident's proficiency.

摘要

背景

ChatGPT于2022年推出,并于2023年更新为生成式预训练变换器4(GPT-4),是一种基于包括医学信息在内的大量数据训练的大型语言模型。本研究将ChatGPT在整形外科在职考试中的表现与全国医学住院医师进行比较,并与其早期版本ChatGPT-3.5进行比较。

方法

本研究回顾了2018年至2023年整形外科在职考试中的1500道题目。在排除基于图像、未计分和无定论的题目后,对1292道题目进行了分析。将题干和每个多项选择题答案逐字输入ChatGPT-4。

结果

ChatGPT-4正确回答了961道(74.4%)纳入的题目。各部分表现最佳的是核心外科原则(正确率79.1%),最差的是颅颌面外科(69.1%)。与所有住院医师相比,ChatGPT-4的百分位排名在第61至97之间。相比之下,在2018 - 2022年考试中,ChatGPT-4的表现显著优于ChatGPT-3.5(<0.001)。虽然ChatGPT-3.5的平均正确率为55.5%,但ChatGPT-4的平均正确率为74%,平均差异为18.54%。2021年,ChatGPT-3.5在所有住院医师中排名第23百分位,而ChatGPT-4排名第97百分位。ChatGPT-4平均超过80.7%的住院医师,在一年级住院医师中得分高于第97百分位。其表现与六年级综合住院医师相当,平均排名第55.7百分位。这些结果表明,在ChatGPT-3.5发布后的六个月内,ChatGPT-4在医学知识应用方面有了显著改进。

结论

本研究揭示了ChatGPT-4的快速发展,从一年级医学住院医师的水平提升到超过独立住院医师,并与六年级住院医师的熟练程度相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e95/11377087/d20932057160/gox-12-e6136-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验