Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan.
Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, 11942, Jordan.
BMC Res Notes. 2024 Sep 3;17(1):247. doi: 10.1186/s13104-024-06920-7.
The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare.
ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.
人工智能(AI)在医疗保健教育中的融合是不可避免的。了解生成式 AI 在不同语言中回答复杂问题的熟练程度对于教育目的至关重要。本研究的目的是比较 ChatGPT-4 和 Gemini 在回答英语和阿拉伯语病毒学多项选择题(MCQs)方面的表现,同时评估生成内容的质量。根据专为评估 AI 生成内容而设计的 CLEAR 工具,评估了这两种 AI 模型对 40 个病毒学 MCQs 的回答的正确性和质量。根据修订后的布鲁姆分类法,将 MCQs 分为较低和较高认知类别。该研究设计考虑了 METRICS 清单,用于设计和报告医疗保健中基于生成式 AI 的研究。
ChatGPT-4 在英语方面的表现优于阿拉伯语,ChatGPT-4 在正确性和 CLEAR 评分方面始终超过 Gemini。与阿拉伯语的 65%相比,ChatGPT-4 在英语中的正确率为 80%,而 Gemini 为 62.5%。对于这两种 AI 模型,报告称在较低认知领域的表现较好。ChatGPT-4 和 Gemini 都具有教育应用的潜力;然而,它们在不同语言中的表现存在差异,突出了持续开发的重要性,以确保 AI 在全球医疗保健教育中的有效融合。