ChatGPT在儿科委员会预备考试中获得及格分数，但也引发了警示信号。

ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags.

作者信息

Le Mindy, Davis Michael

机构信息

University of Florida College of Medicine, Gainesville, FL, USA.

出版信息

Glob Pediatr Health. 2024 Mar 24;11:2333794X241240327. doi: 10.1177/2333794X241240327. eCollection 2024.

DOI:10.1177/2333794X241240327

PMID:38529337

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10962030/

Abstract

OBJECTIVES

We aimed to evaluate the performance of a publicly-available online artificial intelligence program (OpenAI's ChatGPT-3.5 and -4.0, August 3 versions) on a pediatric board preparatory examination, 2021 and 2022 PREP Self-Assessment, American Academy of Pediatrics (AAP).

METHODS

We entered 245 questions and answer choices from the Pediatrics 2021 PREP Self-Assessment and 247 questions and answer choices from the Pediatrics 2022 PREP Self-Assessment into OpenAI's ChatGPT-3.5 and ChatGPT-4.0, August 3 versions, in September 2023. The ChatGPT-3.5 and 4.0 scores were compared with the advertised passing scores (70%+) for the PREP exams and the average scores (74.09%) and (75.71%) for all 10 715 and 6825 first-time human test takers.

RESULTS

For the AAP 2021 and 2022 PREP Self-Assessments, ChatGPT-3.5 answered 143 of 243 (58.85%) and 137 of 247 (55.46%) questions correctly on a single attempt. ChatGPT-4.0 answered 193 of 243 (79.84%) and 208 of 247 (84.21%) questions correctly.

CONCLUSION

Using a publicly-available online chatbot to answer pediatric board preparatory examination questions yielded a passing score but demonstrated significant limitations in the chatbot's ability to assess some complex medical situations in children, posing a potential risk to this vulnerable population.

摘要

目的

我们旨在评估一个公开可用的在线人工智能程序（OpenAI的ChatGPT - 3.5和 - 4.0，8月3日版本）在2021年和2022年美国儿科学会（AAP）儿科委员会预备考试（PREP自我评估）中的表现。

方法

2023年9月，我们将2021年儿科PREP自我评估中的245道问题及答案选项和2022年儿科PREP自我评估中的247道问题及答案选项输入到OpenAI的ChatGPT - 3.5和ChatGPT - 4.0（8月3日版本）中。将ChatGPT - 3.5和4.0的得分与PREP考试公布的及格分数（70%以上）以及所有10715名和6825名首次参加考试的考生的平均分数（74.09%）和（75.71%）进行比较。

结果

对于AAP 2021年和2022年的PREP自我评估，ChatGPT - 3.5单次尝试正确回答了243道题中的143道（58.85%）和247道题中的137道（55.46%）。ChatGPT - 4.0正确回答了243道题中的193道（79.84%）和247道题中的208道（84.21%）。

结论

使用公开可用的在线聊天机器人回答儿科委员会预备考试问题获得了及格分数，但该聊天机器人在评估儿童一些复杂医疗情况的能力方面存在显著局限性，对这一弱势群体构成潜在风险。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT在儿科委员会预备考试中获得及格分数，但也引发了警示信号。

ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

ChatGPT在儿科委员会预备考试中获得及格分数，但也引发了警示信号。

ChatGPT Yields a Passing Score on a Pediatric Board Preparatory Exam but Raises Red Flags.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献