比较ChatGPT-4与医学生在布鲁姆教育目标分类法不同层次多项选择题上的表现。

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.

作者信息

Bharatha Ambadasu, Ojeh Nkemcho, Fazle Rabbi Ahbab Mohammad, Campbell Michael H, Krishnamurthy Kandamaran, Layne-Yarde Rhaheem N A, Kumar Alok, Springer Dale C R, Connell Kenneth L, Majumder Md Anwarul Azim

机构信息

Faculty of Medical Sciences, The University of the West Indies, Bridgetown, Barbados.

Department of Population Sciences, University of Dhaka, Dhaka, Bangladesh.

出版信息

Adv Med Educ Pract. 2024 May 10;15:393-400. doi: 10.2147/AMEP.S457408. eCollection 2024.

DOI:10.2147/AMEP.S457408

PMID:38751805

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11094742/

Abstract

INTRODUCTION

This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom's Taxonomy as a benchmark.

METHODS

A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing.

RESULTS

The study included 304 MCQs. Students demonstrated good knowledge, with 78% correctly answering at least 90% of the questions. However, ChatGPT-4 achieved a higher overall score (73.7%) compared to students (66.7%). Course type significantly affected ChatGPT-4's performance, but revised Bloom's Taxonomy levels did not. A detailed association check between program levels and Bloom's taxonomy levels for correct answers by ChatGPT-4 showed a highly significant correlation (p<0.001), reflecting a concentration of "remember-level" questions in preclinical and "evaluate-level" questions in clinical courses.

DISCUSSION

The study highlights ChatGPT-4's proficiency in standardized tests but indicates limitations in clinical reasoning and practical skills. This performance discrepancy suggests that the effectiveness of artificial intelligence (AI) varies based on course content.

CONCLUSION

While ChatGPT-4 shows promise as an educational tool, its role should be supplementary, with strategic integration into medical education to leverage its strengths and address limitations. Further research is needed to explore AI's impact on medical education and student performance across educational levels and courses.

摘要

引言

本研究以修订后的布鲁姆分类法为基准，调查了ChatGPT-4与医学生在回答多项选择题方面的能力。

方法

在巴巴多斯的西印度群岛大学进行了一项横断面研究。使用基于计算机的测试，对ChatGPT-4和医学生进行了各种医学课程多项选择题的评估。

结果

该研究包括304道多项选择题。学生们表现出了良好的知识水平，78%的学生至少正确回答了90%的问题。然而，ChatGPT-4的总体得分（73.7%）高于学生（66.7%）。课程类型对ChatGPT-4的表现有显著影响，但修订后的布鲁姆分类法水平没有影响。对ChatGPT-4正确答案的课程水平和布鲁姆分类法水平之间的详细关联检查显示出高度显著的相关性（p<0.001），这反映了临床前课程中“记忆水平”问题的集中以及临床课程中“评估水平”问题的集中。

讨论

该研究突出了ChatGPT-4在标准化测试中的熟练程度，但也指出了其在临床推理和实践技能方面的局限性。这种表现差异表明人工智能（AI）的有效性因课程内容而异。

结论

虽然ChatGPT-4作为一种教育工具显示出了潜力，但其作用应该是辅助性的，通过战略整合到医学教育中以发挥其优势并解决局限性。需要进一步的研究来探索人工智能对医学教育以及不同教育水平和课程中学生表现的影响。

相似文献

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.

Adv Med Educ Pract. 2024 May 10;15:393-400. doi: 10.2147/AMEP.S457408. eCollection 2024.

Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.

J Med Internet Res. 2024 Jan 23;26:e52113. doi: 10.2196/52113.

Comparing the performance of artificial intelligence learning models to medical students in solving histology and embryology multiple choice questions.

Ann Anat. 2024 Jun;254:152261. doi: 10.1016/j.aanat.2024.152261. Epub 2024 Mar 21.

Climbing Bloom's taxonomy pyramid: Lessons from a graduate histology course.

Anat Sci Educ. 2017 Sep;10(5):456-464. doi: 10.1002/ase.1685. Epub 2017 Feb 23.

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.

Adv Med Educ Pract. 2024 Sep 20;15:857-871. doi: 10.2147/AMEP.S479801. eCollection 2024.

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.

BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7.

What faculty write versus what students see? Perspectives on multiple-choice questions using Bloom's taxonomy.

Med Teach. 2021 May;43(5):575-582. doi: 10.1080/0142159X.2021.1879376. Epub 2021 Feb 16.

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.

J Med Educ Curric Dev. 2023 Sep 28;10:23821205231204178. doi: 10.1177/23821205231204178. eCollection 2023 Jan-Dec.

Evaluating ChatGPT-3.5 and Claude-2 in Answering and Explaining Conceptual Medical Physiology Multiple-Choice Questions.

Cureus. 2023 Sep 29;15(9):e46222. doi: 10.7759/cureus.46222. eCollection 2023 Sep.

引用本文的文献

Comparative evaluation of large language models performance in medical education using urinary system histology assessment.

Sci Rep. 2025 Aug 29;15(1):31933. doi: 10.1038/s41598-025-17571-4.

Reframing individual roles in collaboration: digital identity construction and adaptive mechanisms for resistance-based professional skills in AI-human intelligence symbiosis.

Front Psychol. 2025 Aug 8;16:1652130. doi: 10.3389/fpsyg.2025.1652130. eCollection 2025.

Evaluating AI-generated examination papers in periodontology: a comparative study with human-designed counterparts.

BMC Med Educ. 2025 Jul 23;25(1):1099. doi: 10.1186/s12909-025-07706-6.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

Large Language Models in Biochemistry Education: Comparative Evaluation of Performance.

JMIR Med Educ. 2025 Apr 10;11:e67244. doi: 10.2196/67244.

Attitudes and perceptions of Thai medical students regarding artificial intelligence in radiology and medicine.

BMC Med Educ. 2024 Oct 22;24(1):1188. doi: 10.1186/s12909-024-06150-2.

The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.

BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7.

本文引用的文献

ChatGPT applications in medical, dental, pharmacy, and public health education: A descriptive study highlighting the advantages and limitations.

Narra J. 2023 Apr;3(1):e103. doi: 10.52225/narra.v3i1.103. Epub 2023 Mar 29.

Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.

J Med Internet Res. 2024 Jan 23;26:e52113. doi: 10.2196/52113.

ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios.

Cureus. 2023 Dec 16;15(12):e50629. doi: 10.7759/cureus.50629. eCollection 2023 Dec.

Assessment of the capacity of ChatGPT as a self-learning tool in medical pharmacology: a study using MCQs.

BMC Med Educ. 2023 Nov 13;23(1):864. doi: 10.1186/s12909-023-04832-x.

AI in medical education: medical student perception, curriculum recommendations and design suggestions.

BMC Med Educ. 2023 Nov 9;23(1):852. doi: 10.1186/s12909-023-04700-8.

Evaluating ChatGPT-3.5 and Claude-2 in Answering and Explaining Conceptual Medical Physiology Multiple-Choice Questions.

Cureus. 2023 Sep 29;15(9):e46222. doi: 10.7759/cureus.46222. eCollection 2023 Sep.

Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study.

J Educ Eval Health Prof. 2023;20:28. doi: 10.3352/jeehp.2023.20.28. Epub 2023 Oct 16.

Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.

JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.

Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment.

Front Med (Lausanne). 2023 Sep 19;10:1240915. doi: 10.3389/fmed.2023.1240915. eCollection 2023.

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

比较ChatGPT-4与医学生在布鲁姆教育目标分类法不同层次多项选择题上的表现。

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.

作者信息

机构信息

Faculty of Medical Sciences, The University of the West Indies, Bridgetown, Barbados.

Department of Population Sciences, University of Dhaka, Dhaka, Bangladesh.

出版信息

Adv Med Educ Pract. 2024 May 10;15:393-400. doi: 10.2147/AMEP.S457408. eCollection 2024.

DOI:10.2147/AMEP.S457408

PMID:38751805

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11094742/

Abstract

INTRODUCTION

This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom's Taxonomy as a benchmark.

METHODS

A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing.

RESULTS

DISCUSSION

CONCLUSION

摘要

引言

本研究以修订后的布鲁姆分类法为基准，调查了ChatGPT-4与医学生在回答多项选择题方面的能力。

方法

在巴巴多斯的西印度群岛大学进行了一项横断面研究。使用基于计算机的测试，对ChatGPT-4和医学生进行了各种医学课程多项选择题的评估。

比较ChatGPT-4与医学生在布鲁姆教育目标分类法不同层次多项选择题上的表现。

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

CONCLUSION

引言

方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

比较ChatGPT-4与医学生在布鲁姆教育目标分类法不同层次多项选择题上的表现。

Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

CONCLUSION

引言

方法

结果

讨论

结论