揭示人工智能在整形外科学教育中的潜力：领先人工智能平台在培训期间考试表现的比较研究

Unveiling the Potential of AI in Plastic Surgery Education: A Comparative Study of Leading AI Platforms' Performance on In-training Examinations.

作者信息

DiDonna Nicole, Shetty Pragna N, Khan Kamran, Damitz Lynn

机构信息

From the School of Medicine, University of North Carolina, Chapel Hill, N.C.

Division of Plastic and Reconstructive Surgery, University of North Carolina, Chapel Hill, N.C.

出版信息

Plast Reconstr Surg Glob Open. 2024 Jun 21;12(6):e5929. doi: 10.1097/GOX.0000000000005929. eCollection 2024 Jun.

DOI:10.1097/GOX.0000000000005929

PMID:38911577

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11191997/

Abstract

BACKGROUND

Within the last few years, artificial intelligence (AI) chatbots have sparked fascination for their potential as an educational tool. Although it has been documented that one such chatbot, ChatGPT, is capable of performing at a moderate level on plastic surgery examinations and has the capacity to become a beneficial educational tool, the potential of other chatbots remains unexplored.

METHODS

To investigate the efficacy of AI chatbots in plastic surgery education, performance on the 2019-2023 Plastic Surgery In-service Training Examination (PSITE) was compared among seven popular AI platforms: ChatGPT-3.5, ChatGPT-4.0, Google Bard, Google PaLM, Microsoft Bing AI, Claude, and My AI by Snapchat. Answers were evaluated for accuracy and incorrect responses were characterized by question category and error type.

RESULTS

ChatGPT-4.0 outperformed the other platforms, reaching accuracy rates up to 79%. On the 2023 PSITE, ChatGPT-4.0 ranked in the 95th percentile of first-year residents; however, relative performance worsened when compared with upper-level residents, with the platform ranking in the 12th percentile of sixth-year residents. The performance among other chatbots was comparable, with their average PSITE score (2019-2023) ranging from 48.6% to 57.0%.

CONCLUSIONS

Results of our study indicate that ChatGPT-4.0 has potential as an educational tool in the field of plastic surgery; however, given their poor performance on the PSITE, the use of other chatbots should be cautioned against at this time. To our knowledge, this is the first article comparing the performance of multiple AI chatbots within the realm of plastic surgery education.

摘要

背景

在过去几年中，人工智能（AI）聊天机器人因其作为教育工具的潜力而引发了人们的兴趣。尽管有文献记载，像ChatGPT这样的一款聊天机器人在整形外科考试中能够达到中等水平，并且有潜力成为一种有益的教育工具，但其他聊天机器人的潜力仍未得到探索。

方法

为了研究人工智能聊天机器人在整形外科教育中的效果，我们比较了七个流行的人工智能平台在2019 - 2023年整形外科在职培训考试（PSITE）中的表现，这七个平台分别是：ChatGPT - 3.5、ChatGPT - 4.0、谷歌巴德、谷歌帕姆、微软必应人工智能、克劳德和Snapchat的My AI。我们评估了答案的准确性，并按问题类别和错误类型对错误回答进行了分类。

结果

ChatGPT - 4.0的表现优于其他平台，准确率高达79%。在2023年的PSITE考试中，ChatGPT - 4.0在一年级住院医师中排名第95百分位；然而，与高年级住院医师相比，其相对表现有所下降，该平台在六年级住院医师中排名第12百分位。其他聊天机器人的表现相当，它们在PSITE（2019 - 2023年）的平均得分在48.6%至57.0%之间。

结论

我们的研究结果表明，ChatGPT - 4.0在整形外科领域有作为教育工具的潜力；然而，鉴于它们在PSITE考试中的表现不佳，目前应谨慎使用其他聊天机器人。据我们所知，这是第一篇比较多个人工智能聊天机器人在整形外科教育领域表现的文章。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e63/11191997/15c1b8a5cd19/gox-12-e5929-g001.jpg

相似文献

Unveiling the Potential of AI in Plastic Surgery Education: A Comparative Study of Leading AI Platforms' Performance on In-training Examinations.揭示人工智能在整形外科学教育中的潜力：领先人工智能平台在培训期间考试表现的比较研究

Plast Reconstr Surg Glob Open. 2024 Jun 21;12(6):e5929. doi: 10.1097/GOX.0000000000005929. eCollection 2024 Jun.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。

Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.

Efficacy of AI Chats to Determine an Emergency: A Comparison Between OpenAI's ChatGPT, Google Bard, and Microsoft Bing AI Chat.人工智能聊天工具在判定紧急情况方面的效能：OpenAI的ChatGPT、谷歌巴德和微软必应人工智能聊天工具的比较

Cureus. 2023 Sep 18;15(9):e45473. doi: 10.7759/cureus.45473. eCollection 2023 Sep.

Exploring the Possible Use of AI Chatbots in Public Health Education: Feasibility Study.探索人工智能聊天机器人在公共卫生教育中的潜在用途：可行性研究。

JMIR Med Educ. 2023 Nov 1;9:e51421. doi: 10.2196/51421.

Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.ChatGPT-4、微软 Copilot 和谷歌 Gemini 在意大利医疗科学学位入学考试中的比较准确性：一项横断面研究。

BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.

Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.ChatGPT-4和Bard聊天机器人在回答关于前列腺癌Lu-PSMA-617疗法常见患者问题方面的表现

Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性：ChatGPT与谷歌巴德人工智能的比较分析

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德相对于传统药物相互作用临床工具的敏感性、特异性和准确性。

Drug Healthc Patient Saf. 2023 Sep 20;15:137-147. doi: 10.2147/DHPS.S425858. eCollection 2023.

The scientific knowledge of three large language models in cardiology: multiple-choice questions examination-based performance.三种大型语言模型在心脏病学方面的科学知识：基于多项选择题考试的表现。

Ann Med Surg (Lond). 2024 May 6;86(6):3261-3266. doi: 10.1097/MS9.0000000000002120. eCollection 2024 Jun.

本文引用的文献

Visualization in Plastic Surgery: Open-Source Artificial Intelligence Can Accelerate Reconstructive Operative Techniques and Reports.整形手术中的可视化：开源人工智能可加速重建手术技术与报告。

Plast Reconstr Surg. 2024 Jan 1;153(1):225e-226e. doi: 10.1097/PRS.0000000000010907. Epub 2023 Jul 4.

High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content.ChatGPT生成的医学内容中虚假和不准确参考文献的高比例。

Cureus. 2023 May 19;15(5):e39238. doi: 10.7759/cureus.39238. eCollection 2023 May.

Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。

Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.

Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions.医学教育中的大语言模型：机遇、挑战与未来方向。

JMIR Med Educ. 2023 Jun 1;9:e48291. doi: 10.2196/48291.

ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations.医学领域的ChatGPT：其应用、优势、局限性、未来前景及伦理考量概述

Front Artif Intell. 2023 May 4;6:1169595. doi: 10.3389/frai.2023.1169595. eCollection 2023.

OpenAI's ChatGPT and Its Role in Plastic Surgery Research.OpenAI的ChatGPT及其在整形外科学研究中的作用。

Plast Reconstr Surg. 2023 May 1;151(5):1111-1113. doi: 10.1097/PRS.0000000000010342. Epub 2023 Apr 26.

Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References.探索现实的边界：通过ChatGPT参考文献研究科学写作中的人工智能幻觉现象。

Cureus. 2023 Apr 11;15(4):e37432. doi: 10.7759/cureus.37432. eCollection 2023 Apr.

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models.ChatGPT走进手术室：在大语言模型时代评估GPT-4在外科教育与培训中的表现及其潜力。

Ann Surg Treat Res. 2023 May;104(5):269-273. doi: 10.4174/astr.2023.104.5.269. Epub 2023 Apr 28.

Finding the Place of ChatGPT in Medical Education.探寻ChatGPT在医学教育中的地位。

Acad Med. 2023 Aug 1;98(8):867. doi: 10.1097/ACM.0000000000005254. Epub 2023 May 10.

Evaluating Chatbot Efficacy for Answering Frequently Asked Questions in Plastic Surgery: A ChatGPT Case Study Focused on Breast Augmentation.评估聊天机器人在回答整形手术常见问题方面的效果：以聚焦隆胸手术的ChatGPT为例的研究

Aesthet Surg J. 2023 Sep 14;43(10):1126-1135. doi: 10.1093/asj/sjad140.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

揭示人工智能在整形外科学教育中的潜力：领先人工智能平台在培训期间考试表现的比较研究

Unveiling the Potential of AI in Plastic Surgery Education: A Comparative Study of Leading AI Platforms' Performance on In-training Examinations.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献