Bharatha Ambadasu, Ojeh Nkemcho, Fazle Rabbi Ahbab Mohammad, Campbell Michael H, Krishnamurthy Kandamaran, Layne-Yarde Rhaheem N A, Kumar Alok, Springer Dale C R, Connell Kenneth L, Majumder Md Anwarul Azim
Faculty of Medical Sciences, The University of the West Indies, Bridgetown, Barbados.
Department of Population Sciences, University of Dhaka, Dhaka, Bangladesh.
Adv Med Educ Pract. 2024 May 10;15:393-400. doi: 10.2147/AMEP.S457408. eCollection 2024.
This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom's Taxonomy as a benchmark.
A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing.
The study included 304 MCQs. Students demonstrated good knowledge, with 78% correctly answering at least 90% of the questions. However, ChatGPT-4 achieved a higher overall score (73.7%) compared to students (66.7%). Course type significantly affected ChatGPT-4's performance, but revised Bloom's Taxonomy levels did not. A detailed association check between program levels and Bloom's taxonomy levels for correct answers by ChatGPT-4 showed a highly significant correlation (p<0.001), reflecting a concentration of "remember-level" questions in preclinical and "evaluate-level" questions in clinical courses.
The study highlights ChatGPT-4's proficiency in standardized tests but indicates limitations in clinical reasoning and practical skills. This performance discrepancy suggests that the effectiveness of artificial intelligence (AI) varies based on course content.
While ChatGPT-4 shows promise as an educational tool, its role should be supplementary, with strategic integration into medical education to leverage its strengths and address limitations. Further research is needed to explore AI's impact on medical education and student performance across educational levels and courses.
本研究以修订后的布鲁姆分类法为基准,调查了ChatGPT-4与医学生在回答多项选择题方面的能力。
在巴巴多斯的西印度群岛大学进行了一项横断面研究。使用基于计算机的测试,对ChatGPT-4和医学生进行了各种医学课程多项选择题的评估。
该研究包括304道多项选择题。学生们表现出了良好的知识水平,78%的学生至少正确回答了90%的问题。然而,ChatGPT-4的总体得分(73.7%)高于学生(66.7%)。课程类型对ChatGPT-4的表现有显著影响,但修订后的布鲁姆分类法水平没有影响。对ChatGPT-4正确答案的课程水平和布鲁姆分类法水平之间的详细关联检查显示出高度显著的相关性(p<0.001),这反映了临床前课程中“记忆水平”问题的集中以及临床课程中“评估水平”问题的集中。
该研究突出了ChatGPT-4在标准化测试中的熟练程度,但也指出了其在临床推理和实践技能方面的局限性。这种表现差异表明人工智能(AI)的有效性因课程内容而异。
虽然ChatGPT-4作为一种教育工具显示出了潜力,但其作用应该是辅助性的,通过战略整合到医学教育中以发挥其优势并解决局限性。需要进一步的研究来探索人工智能对医学教育以及不同教育水平和课程中学生表现的影响。