• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT在儿科委员会预备考试中获得及格分数,但也引发了警示信号。

ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags.

作者信息

Le Mindy, Davis Michael

机构信息

University of Florida College of Medicine, Gainesville, FL, USA.

出版信息

Glob Pediatr Health. 2024 Mar 24;11:2333794X241240327. doi: 10.1177/2333794X241240327. eCollection 2024.

DOI:10.1177/2333794X241240327
PMID:38529337
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10962030/
Abstract

OBJECTIVES

We aimed to evaluate the performance of a publicly-available online artificial intelligence program (OpenAI's ChatGPT-3.5 and -4.0, August 3 versions) on a pediatric board preparatory examination, 2021 and 2022 PREP Self-Assessment, American Academy of Pediatrics (AAP).

METHODS

We entered 245 questions and answer choices from the Pediatrics 2021 PREP Self-Assessment and 247 questions and answer choices from the Pediatrics 2022 PREP Self-Assessment into OpenAI's ChatGPT-3.5 and ChatGPT-4.0, August 3 versions, in September 2023. The ChatGPT-3.5 and 4.0 scores were compared with the advertised passing scores (70%+) for the PREP exams and the average scores (74.09%) and (75.71%) for all 10 715 and 6825 first-time human test takers.

RESULTS

For the AAP 2021 and 2022 PREP Self-Assessments, ChatGPT-3.5 answered 143 of 243 (58.85%) and 137 of 247 (55.46%) questions correctly on a single attempt. ChatGPT-4.0 answered 193 of 243 (79.84%) and 208 of 247 (84.21%) questions correctly.

CONCLUSION

Using a publicly-available online chatbot to answer pediatric board preparatory examination questions yielded a passing score but demonstrated significant limitations in the chatbot's ability to assess some complex medical situations in children, posing a potential risk to this vulnerable population.

摘要

目的

我们旨在评估一个公开可用的在线人工智能程序(OpenAI的ChatGPT - 3.5和 - 4.0,8月3日版本)在2021年和2022年美国儿科学会(AAP)儿科委员会预备考试(PREP自我评估)中的表现。

方法

2023年9月,我们将2021年儿科PREP自我评估中的245道问题及答案选项和2022年儿科PREP自我评估中的247道问题及答案选项输入到OpenAI的ChatGPT - 3.5和ChatGPT - 4.0(8月3日版本)中。将ChatGPT - 3.5和4.0的得分与PREP考试公布的及格分数(70%以上)以及所有10715名和6825名首次参加考试的考生的平均分数(74.09%)和(75.71%)进行比较。

结果

对于AAP 2021年和2022年的PREP自我评估,ChatGPT - 3.5单次尝试正确回答了243道题中的143道(58.85%)和247道题中的137道(55.46%)。ChatGPT - 4.0正确回答了243道题中的193道(79.84%)和247道题中的208道(84.21%)。

结论

使用公开可用的在线聊天机器人回答儿科委员会预备考试问题获得了及格分数,但该聊天机器人在评估儿童一些复杂医疗情况的能力方面存在显著局限性,对这一弱势群体构成潜在风险。

相似文献

1
ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags.ChatGPT在儿科委员会预备考试中获得及格分数,但也引发了警示信号。
Glob Pediatr Health. 2024 Mar 24;11:2333794X241240327. doi: 10.1177/2333794X241240327. eCollection 2024.
2
Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.ChatGPT在骨科委员会风格笔试中的表现不佳。
Cureus. 2024 Jun 18;16(6):e62643. doi: 10.7759/cureus.62643. eCollection 2024 Jun.
3
Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.人工智能与医学专业人员在波兰医学期末考试中的表现比较
Cureus. 2024 Aug 2;16(8):e66011. doi: 10.7759/cureus.66011. eCollection 2024 Aug.
4
Universal precautions required: Artificial intelligence takes on the Australian Medical Council's trial examination.通用预防措施要求:人工智能接受澳大利亚医学委员会的试用考试。
Aust J Gen Pract. 2023 Dec;52(12):863-865. doi: 10.31128/AJGP-02-23-6708.
5
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。
Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.
6
Progression of an Artificial Intelligence Chatbot (ChatGPT) for Pediatric Cardiology Educational Knowledge Assessment.人工智能聊天机器人(ChatGPT)在儿科心脏病学教育知识评估中的应用进展。
Pediatr Cardiol. 2024 Feb;45(2):309-313. doi: 10.1007/s00246-023-03385-6. Epub 2024 Jan 3.
7
Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models.人工智能在麻醉学 board 式考试问题中的应用:大语言模型的作用。
J Cardiothorac Vasc Anesth. 2024 May;38(5):1251-1259. doi: 10.1053/j.jvca.2024.01.032. Epub 2024 Feb 1.
8
ChatGPT-3.5 passes Poland's medical final examination-Is it possible for ChatGPT to become a doctor in Poland?ChatGPT-3.5通过了波兰的医学期末考试——ChatGPT有可能在波兰成为一名医生吗?
SAGE Open Med. 2024 Jun 17;12:20503121241257777. doi: 10.1177/20503121241257777. eCollection 2024.
9
ChatGPT-4 Performance on USMLE Step 1 Style Questions and Its Implications for Medical Education: A Comparative Study Across Systems and Disciplines.ChatGPT-4在美国医师执照考试第一步(USMLE Step 1)题型问题上的表现及其对医学教育的影响:跨系统和学科的比较研究
Med Sci Educ. 2023 Dec 27;34(1):145-152. doi: 10.1007/s40670-023-01956-z. eCollection 2024 Feb.
10
Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review.人工智能聊天机器人(包括 ChatGPT)在教育、学术工作、编程、内容生成等领域的应用及其前景:叙述性综述。
J Educ Eval Health Prof. 2023;20:38. doi: 10.3352/jeehp.2023.20.38. Epub 2023 Dec 27.

引用本文的文献

1
Performance of DeepSeek and GPT Models on Pediatric Board Preparation Questions: Comparative Evaluation.DeepSeek和GPT模型在儿科住院医师培训准备问题上的表现:比较评估。
JMIR AI. 2025 Aug 27;4:e76056. doi: 10.2196/76056.
2
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
3
Performance of 5 Prominent Large Language Models in Surgical Knowledge Evaluation: A Comparative Analysis.5种著名大语言模型在外科知识评估中的表现:一项比较分析。
Mayo Clin Proc Digit Health. 2024 Jun 5;2(3):348-350. doi: 10.1016/j.mcpdig.2024.05.022. eCollection 2024 Sep.

本文引用的文献

1
Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2).ChatGPT 在医学中作为 AI 辅助决策支持工具的性能:解释常见心脏疾病症状和管理的概念验证研究 (AMSTELHEART-2)。
Acta Cardiol. 2024 May;79(3):358-366. doi: 10.1080/00015385.2024.2303528. Epub 2024 Feb 13.
2
Automation Bias and Assistive AI: Risk of Harm From AI-Driven Clinical Decision Support.自动化偏差与辅助性人工智能:人工智能驱动的临床决策支持带来的伤害风险
JAMA. 2023 Dec 19;330(23):2255-2257. doi: 10.1001/jama.2023.22557.
3
Creation and Adoption of Large Language Models in Medicine.医学领域中大型语言模型的创建与采用。
JAMA. 2023 Sep 5;330(9):866-869. doi: 10.1001/jama.2023.14217.
4
Revolutionizing Healthcare with ChatGPT: An Early Exploration of an AI Language Model's Impact on Medicine at Large and its Role in Pediatric Surgery.利用 ChatGPT 革新医疗保健:人工智能语言模型对医学整体的早期影响及其在小儿外科学中的作用的探索
J Pediatr Surg. 2023 Dec;58(12):2410-2415. doi: 10.1016/j.jpedsurg.2023.07.008. Epub 2023 Jul 20.
5
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
6
Snakebite Advice and Counseling From Artificial Intelligence: An Acute Venomous Snakebite Consultation With ChatGPT.来自人工智能的蛇咬伤建议与咨询:与ChatGPT进行的急性毒蛇咬伤会诊
Cureus. 2023 Jun 13;15(6):e40351. doi: 10.7759/cureus.40351. eCollection 2023 Jun.
7
Putting ChatGPT's Medical Advice to the (Turing) Test: Survey Study.对ChatGPT的医学建议进行(图灵)测试:调查研究。
JMIR Med Educ. 2023 Jul 10;9:e46939. doi: 10.2196/46939.
8
Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。
Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.
9
Evaluating Chatbot Efficacy for Answering Frequently Asked Questions in Plastic Surgery: A ChatGPT Case Study Focused on Breast Augmentation.评估聊天机器人在回答整形手术常见问题方面的效果:以聚焦隆胸手术的ChatGPT为例的研究
Aesthet Surg J. 2023 Sep 14;43(10):1126-1135. doi: 10.1093/asj/sjad140.
10
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.