• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.评估ChatGPT-3.5和ChatGPT-4在台湾整形外科医师资格考试中的表现。
Heliyon. 2024 Jul 18;10(14):e34851. doi: 10.1016/j.heliyon.2024.e34851. eCollection 2024 Jul 30.
2
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
3
Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank.将 ChatGPT-3.5、ChatGPT-4、Bing Chat 和 Bard 用于韩国急诊医学 board 考试题库的问题解决性能比较。
Medicine (Baltimore). 2024 Mar 1;103(9):e37325. doi: 10.1097/MD.0000000000037325.
4
ChatGPT failed Taiwan's Family Medicine Board Exam.ChatGPT 未能通过台湾家庭医学专科医师甄试。
J Chin Med Assoc. 2023 Aug 1;86(8):762-766. doi: 10.1097/JCMA.0000000000000946. Epub 2023 Jun 9.
5
ChatGPT's Performance on the Hand Surgery Self-Assessment Exam: A Critical Analysis.ChatGPT在手外科自我评估考试中的表现:一项批判性分析。
J Hand Surg Glob Online. 2024 Jan 2;6(2):200-205. doi: 10.1016/j.jhsg.2023.11.014. eCollection 2024 Mar.
6
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。
JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.
7
A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology-Head and Neck Surgery Certification Examinations: Performance Study.一种评估 ChatGPT 在耳鼻喉头颈外科认证考试中表现的新评价模型:性能研究。
JMIR Med Educ. 2024 Jan 16;10:e49970. doi: 10.2196/49970.
8
Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings.ChatGPT 在台湾泌尿科考试中的表现:洞察当前的优势和不足。
World J Urol. 2024 Apr 23;42(1):250. doi: 10.1007/s00345-024-04957-8.
9
Performance of ChatGPT incorporated chain-of-thought method in bilingual nuclear medicine physician board examinations.结合思维链方法的ChatGPT在双语核医学医师资格考试中的表现
Digit Health. 2024 Jan 5;10:20552076231224074. doi: 10.1177/20552076231224074. eCollection 2024 Jan-Dec.
10
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。
Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

引用本文的文献

1
Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey.土耳其牙科专业考试中人工智能系统回答口腔修复学问题的比较
J Dent Sci. 2025 Jul;20(3):1454-1459. doi: 10.1016/j.jds.2025.01.025. Epub 2025 Jan 31.
2
Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.大型语言人工智能模型在解决修复牙科和牙髓学生评估方面的性能。
Clin Oral Investig. 2024 Oct 7;28(11):575. doi: 10.1007/s00784-024-05968-w.
3
Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination.与GPT-3.5、GPT-4和GPT-4o相比,定制生成式预训练变换器(Custom GPTs)在提升性能和证据方面如何?一项关于急诊医学专科考试的研究。
Healthcare (Basel). 2024 Aug 30;12(17):1726. doi: 10.3390/healthcare12171726.

本文引用的文献

1
Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy.比较基于生成和检索的聊天机器人在回答与年龄相关性黄斑变性和糖尿病视网膜病变相关的患者问题方面的表现。
Br J Ophthalmol. 2024 Sep 20;108(10):1443-1449. doi: 10.1136/bjo-2023-324533.
2
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。
JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.
3
Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings.ChatGPT 在台湾泌尿科考试中的表现:洞察当前的优势和不足。
World J Urol. 2024 Apr 23;42(1):250. doi: 10.1007/s00345-024-04957-8.
4
Xiaoqing: A Q&A model for glaucoma based on LLMs.晓青:基于大语言模型的青光眼问答模型。
Comput Biol Med. 2024 May;174:108399. doi: 10.1016/j.compbiomed.2024.108399. Epub 2024 Apr 12.
5
ChatGPT in medicine: prospects and challenges: a review article.ChatGPT 在医学中的应用:前景与挑战:一篇综述文章。
Int J Surg. 2024 Jun 1;110(6):3701-3706. doi: 10.1097/JS9.0000000000001312.
6
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review.大型语言模型(包括 ChatGPT 在医学教育中的应用)的机遇、挑战及未来发展方向:系统范围界定综述。
J Educ Eval Health Prof. 2024;21:6. doi: 10.3352/jeehp.2024.21.6. Epub 2024 Mar 15.
7
The Potential Applications and Challenges of ChatGPT in the Medical Field.ChatGPT在医学领域的潜在应用与挑战
Int J Gen Med. 2024 Mar 5;17:817-826. doi: 10.2147/IJGM.S456659. eCollection 2024.
8
Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank.将 ChatGPT-3.5、ChatGPT-4、Bing Chat 和 Bard 用于韩国急诊医学 board 考试题库的问题解决性能比较。
Medicine (Baltimore). 2024 Mar 1;103(9):e37325. doi: 10.1097/MD.0000000000037325.
9
Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses.ChatGPT 在国家医师、药师、护士等医学类考试中的表现:一项针对医、药、护人员的五年考试评估研究。
BMC Med Educ. 2024 Feb 14;24(1):143. doi: 10.1186/s12909-024-05125-7.
10
Harnessing the potential of large language models in medical education: promise and pitfalls.利用大语言模型在医学教育中的潜力:前景与陷阱。
J Am Med Inform Assoc. 2024 Feb 16;31(3):776-783. doi: 10.1093/jamia/ocad252.

评估ChatGPT-3.5和ChatGPT-4在台湾整形外科医师资格考试中的表现。

Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.

作者信息

Hsieh Ching-Hua, Hsieh Hsiao-Yun, Lin Hui-Ping

机构信息

Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan.

出版信息

Heliyon. 2024 Jul 18;10(14):e34851. doi: 10.1016/j.heliyon.2024.e34851. eCollection 2024 Jul 30.

DOI:10.1016/j.heliyon.2024.e34851
PMID:39149010
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11324965/
Abstract

BACKGROUND

Chat Generative Pre-Trained Transformer (ChatGPT) is a state-of-the-art large language model that has been evaluated across various medical fields, with mixed performance on licensing examinations. This study aimed to assess the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions from the Taiwan Plastic Surgery Board Examination.

METHODS

The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on 1375 questions from the past 8 years of the Taiwan Plastic Surgery Board Examination, including 985 single-choice and 390 multiple-choice questions. We obtained the responses between June and July 2023, launching a new chat session for each question to eliminate memory retention bias.

RESULTS

Overall, ChatGPT-4 outperformed ChatGPT-3.5, achieving a 59 % correct answer rate compared to 41 % for ChatGPT-3.5. ChatGPT-4 passed five out of eight yearly exams, whereas ChatGPT-3.5 failed all. On single-choice questions, ChatGPT-4 scored 66 % correct, compared to 48 % for ChatGPT-3.5. On multiple-choice, ChatGPT-4 achieved a 43 % correct rate, nearly double the 23 % of ChatGPT-3.5.

CONCLUSION

As ChatGPT evolves, its performance on the Taiwan Plastic Surgery Board Examination is expected to improve further. The study suggests potential reforms, such as incorporating more problem-based scenarios, leveraging ChatGPT to refine exam questions, and integrating AI-assisted learning into candidate preparation. These advancements could enhance the assessment of candidates' critical thinking and problem-solving abilities in the field of plastic surgery.

摘要

背景

聊天生成预训练变换器(ChatGPT)是一种先进的大型语言模型,已在各个医学领域进行了评估,在执照考试中的表现参差不齐。本研究旨在评估ChatGPT-3.5和ChatGPT-4在回答台湾整形外科委员会考试问题方面的表现。

方法

该研究评估了ChatGPT-3.5和ChatGPT-4对台湾整形外科委员会考试过去8年的1375道问题的表现,包括985道单项选择题和390道多项选择题。我们在2023年6月至7月期间获得了回答,为每个问题开启一个新的聊天会话,以消除记忆保留偏差。

结果

总体而言,ChatGPT-4的表现优于ChatGPT-3.5,正确答案率达到59%,而ChatGPT-3.5为41%。ChatGPT-4在八项年度考试中的五项中通过,而ChatGPT-3.5全部未通过。在单项选择题上,ChatGPT-4的正确率为66%,而ChatGPT-3.5为48%。在多项选择题上,ChatGPT-4的正确率为43%,几乎是ChatGPT-3.5的23%的两倍。

结论

随着ChatGPT的不断发展,其在台湾整形外科委员会考试中的表现有望进一步提高。该研究提出了潜在的改革建议,例如纳入更多基于问题的情景、利用ChatGPT完善考试题目,以及将人工智能辅助学习纳入考生备考。这些进步可以加强对整形外科领域考生批判性思维和解决问题能力的评估。