Suppr超能文献

揭示人工智能在整形外科学教育中的潜力:领先人工智能平台在培训期间考试表现的比较研究

Unveiling the Potential of AI in Plastic Surgery Education: A Comparative Study of Leading AI Platforms' Performance on In-training Examinations.

作者信息

DiDonna Nicole, Shetty Pragna N, Khan Kamran, Damitz Lynn

机构信息

From the School of Medicine, University of North Carolina, Chapel Hill, N.C.

Division of Plastic and Reconstructive Surgery, University of North Carolina, Chapel Hill, N.C.

出版信息

Plast Reconstr Surg Glob Open. 2024 Jun 21;12(6):e5929. doi: 10.1097/GOX.0000000000005929. eCollection 2024 Jun.

Abstract

BACKGROUND

Within the last few years, artificial intelligence (AI) chatbots have sparked fascination for their potential as an educational tool. Although it has been documented that one such chatbot, ChatGPT, is capable of performing at a moderate level on plastic surgery examinations and has the capacity to become a beneficial educational tool, the potential of other chatbots remains unexplored.

METHODS

To investigate the efficacy of AI chatbots in plastic surgery education, performance on the 2019-2023 Plastic Surgery In-service Training Examination (PSITE) was compared among seven popular AI platforms: ChatGPT-3.5, ChatGPT-4.0, Google Bard, Google PaLM, Microsoft Bing AI, Claude, and My AI by Snapchat. Answers were evaluated for accuracy and incorrect responses were characterized by question category and error type.

RESULTS

ChatGPT-4.0 outperformed the other platforms, reaching accuracy rates up to 79%. On the 2023 PSITE, ChatGPT-4.0 ranked in the 95th percentile of first-year residents; however, relative performance worsened when compared with upper-level residents, with the platform ranking in the 12th percentile of sixth-year residents. The performance among other chatbots was comparable, with their average PSITE score (2019-2023) ranging from 48.6% to 57.0%.

CONCLUSIONS

Results of our study indicate that ChatGPT-4.0 has potential as an educational tool in the field of plastic surgery; however, given their poor performance on the PSITE, the use of other chatbots should be cautioned against at this time. To our knowledge, this is the first article comparing the performance of multiple AI chatbots within the realm of plastic surgery education.

摘要

背景

在过去几年中,人工智能(AI)聊天机器人因其作为教育工具的潜力而引发了人们的兴趣。尽管有文献记载,像ChatGPT这样的一款聊天机器人在整形外科考试中能够达到中等水平,并且有潜力成为一种有益的教育工具,但其他聊天机器人的潜力仍未得到探索。

方法

为了研究人工智能聊天机器人在整形外科教育中的效果,我们比较了七个流行的人工智能平台在2019 - 2023年整形外科在职培训考试(PSITE)中的表现,这七个平台分别是:ChatGPT - 3.5、ChatGPT - 4.0、谷歌巴德、谷歌帕姆、微软必应人工智能、克劳德和Snapchat的My AI。我们评估了答案的准确性,并按问题类别和错误类型对错误回答进行了分类。

结果

ChatGPT - 4.0的表现优于其他平台,准确率高达79%。在2023年的PSITE考试中,ChatGPT - 4.0在一年级住院医师中排名第95百分位;然而,与高年级住院医师相比,其相对表现有所下降,该平台在六年级住院医师中排名第12百分位。其他聊天机器人的表现相当,它们在PSITE(2019 - 2023年)的平均得分在48.6%至57.0%之间。

结论

我们的研究结果表明,ChatGPT - 4.0在整形外科领域有作为教育工具的潜力;然而,鉴于它们在PSITE考试中的表现不佳,目前应谨慎使用其他聊天机器人。据我们所知,这是第一篇比较多个人工智能聊天机器人在整形外科教育领域表现的文章。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e63/11191997/15c1b8a5cd19/gox-12-e5929-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验