Nottingham Centre for Public Health and Epidemiology, University of Nottingham, Nottingham City Hospital, Hucknall Rd, Nottingham, NG5 1PB, England.
NHS England, Seaton House, City Link, London Road, Nottingham, NG2 4LA, England.
BMC Med Educ. 2024 Jan 11;24(1):57. doi: 10.1186/s12909-024-05042-9.
Artificial intelligence-based large language models, like ChatGPT, have been rapidly assessed for both risks and potential in health-related assessment and learning. However, their applications in public health professional exams have not yet been studied. We evaluated the performance of ChatGPT in part of the Faculty of Public Health's Diplomat exam (DFPH).
ChatGPT was provided with a bank of 119 publicly available DFPH question parts from past papers. Its performance was assessed by two active DFPH examiners. The degree of insight and level of understanding apparently displayed by ChatGPT was also assessed.
ChatGPT passed 3 of 4 papers, surpassing the current pass rate. It performed best on questions relating to research methods. Its answers had a high floor. Examiners identified ChatGPT answers with 73.6% accuracy and human answers with 28.6% accuracy. ChatGPT provided a mean of 3.6 unique insights per question and appeared to demonstrate a required level of learning on 71.4% of occasions.
Large language models have rapidly increasing potential as a learning tool in public health education. However, their factual fallibility and the difficulty of distinguishing their responses from that of humans pose potential threats to teaching and learning.
基于人工智能的大型语言模型,如 ChatGPT,已在健康评估和学习方面迅速被评估其风险和潜力。然而,它们在公共卫生专业考试中的应用尚未得到研究。我们评估了 ChatGPT 在公共卫生专业人员考试(DFPH)部分内容中的表现。
为 ChatGPT 提供了 119 份来自过去试卷的公开可用的 DFPH 问题部分。由两名活跃的 DFPH 考官评估其表现。还评估了 ChatGPT 显然表现出的洞察力程度和理解水平。
ChatGPT 通过了 4 份试卷中的 3 份,超过了目前的通过率。它在与研究方法相关的问题上表现最佳。它的答案基础很高。考官以 73.6%的准确率识别出 ChatGPT 的答案和以 28.6%的准确率识别出人类的答案。ChatGPT 为每个问题平均提供了 3.6 个独特的见解,并且在 71.4%的情况下似乎表现出了所需的学习水平。
大型语言模型在公共卫生教育中作为学习工具的潜力迅速增加。然而,它们在事实方面的错误和难以将其与人类的回答区分开来,对教学和学习构成了潜在威胁。