Le Mindy, Davis Michael
University of Florida College of Medicine, Gainesville, FL, USA.
Glob Pediatr Health. 2024 Mar 24;11:2333794X241240327. doi: 10.1177/2333794X241240327. eCollection 2024.
We aimed to evaluate the performance of a publicly-available online artificial intelligence program (OpenAI's ChatGPT-3.5 and -4.0, August 3 versions) on a pediatric board preparatory examination, 2021 and 2022 PREP Self-Assessment, American Academy of Pediatrics (AAP).
We entered 245 questions and answer choices from the Pediatrics 2021 PREP Self-Assessment and 247 questions and answer choices from the Pediatrics 2022 PREP Self-Assessment into OpenAI's ChatGPT-3.5 and ChatGPT-4.0, August 3 versions, in September 2023. The ChatGPT-3.5 and 4.0 scores were compared with the advertised passing scores (70%+) for the PREP exams and the average scores (74.09%) and (75.71%) for all 10 715 and 6825 first-time human test takers.
For the AAP 2021 and 2022 PREP Self-Assessments, ChatGPT-3.5 answered 143 of 243 (58.85%) and 137 of 247 (55.46%) questions correctly on a single attempt. ChatGPT-4.0 answered 193 of 243 (79.84%) and 208 of 247 (84.21%) questions correctly.
Using a publicly-available online chatbot to answer pediatric board preparatory examination questions yielded a passing score but demonstrated significant limitations in the chatbot's ability to assess some complex medical situations in children, posing a potential risk to this vulnerable population.
我们旨在评估一个公开可用的在线人工智能程序(OpenAI的ChatGPT - 3.5和 - 4.0,8月3日版本)在2021年和2022年美国儿科学会(AAP)儿科委员会预备考试(PREP自我评估)中的表现。
2023年9月,我们将2021年儿科PREP自我评估中的245道问题及答案选项和2022年儿科PREP自我评估中的247道问题及答案选项输入到OpenAI的ChatGPT - 3.5和ChatGPT - 4.0(8月3日版本)中。将ChatGPT - 3.5和4.0的得分与PREP考试公布的及格分数(70%以上)以及所有10715名和6825名首次参加考试的考生的平均分数(74.09%)和(75.71%)进行比较。
对于AAP 2021年和2022年的PREP自我评估,ChatGPT - 3.5单次尝试正确回答了243道题中的143道(58.85%)和247道题中的137道(55.46%)。ChatGPT - 4.0正确回答了243道题中的193道(79.84%)和247道题中的208道(84.21%)。
使用公开可用的在线聊天机器人回答儿科委员会预备考试问题获得了及格分数,但该聊天机器人在评估儿童一些复杂医疗情况的能力方面存在显著局限性,对这一弱势群体构成潜在风险。