Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
Hackensack Meridian School of Medicine, Nutley, New Jersey, USA.
J Med Ethics. 2024 Jan 23;50(2):97-101. doi: 10.1136/jme-2023-109366.
Chat Generative Pre-Trained Transformer (ChatGPT) has been a growing point of interest in medical education yet has not been assessed in the field of bioethics. This study evaluated the accuracy of ChatGPT-3.5 (April 2023 version) in answering text-based, multiple choice bioethics questions at the level of US third-year and fourth-year medical students. A total of 114 bioethical questions were identified from the widely utilised question banks UWorld and AMBOSS. Accuracy, bioethical categories, difficulty levels, specialty data, error analysis and character count were analysed. We found that ChatGPT had an accuracy of 59.6%, with greater accuracy in topics surrounding death and patient-physician relationships and performed poorly on questions pertaining to informed consent. Of all the specialties, it performed best in paediatrics. Yet, certain specialties and bioethical categories were under-represented. Among the errors made, it tended towards content errors and application errors. There were no significant associations between character count and accuracy. Nevertheless, this investigation contributes to the ongoing dialogue on artificial intelligence's (AI) role in healthcare and medical education, advocating for further research to fully understand AI systems' capabilities and constraints in the nuanced field of medical bioethics.
聊天生成预训练转换器(ChatGPT)在医学教育领域引起了越来越多的关注,但尚未在生物伦理学领域进行评估。本研究评估了 ChatGPT-3.5(2023 年 4 月版)在回答基于文本的、多选题生物伦理学问题方面的准确性,这些问题的水平相当于美国三年级和四年级医学生。从广泛使用的题库 UWorld 和 AMBOSS 中确定了 114 个生物伦理问题。分析了准确性、生物伦理类别、难度级别、专业数据、错误分析和字符数。我们发现 ChatGPT 的准确率为 59.6%,在涉及死亡和医患关系的主题上准确率较高,而在涉及知情同意的问题上表现不佳。在所有专业中,儿科表现最好。然而,某些专业和生物伦理类别代表性不足。在犯的错误中,它倾向于内容错误和应用错误。字符数与准确性之间没有显著关联。尽管如此,这项调查为人工智能(AI)在医疗保健和医学教育中的作用的持续对话做出了贡献,倡导进行进一步的研究,以充分了解 AI 系统在医学生物伦理学这一复杂领域的能力和限制。