Aljindan Fahad K, Al Qurashi Abdullah A, Albalawi Ibrahim Abdullah S, Alanazi Abeer Mohammed M, Aljuhani Hussam Abdulkhaliq M, Falah Almutairi Faisal, Aldamigh Omar A, Halawani Ibrahim R, K Zino Alarki Subhi M
Department of Plastic Surgery, King Abdullah Medical City, Makkah, SAU.
College of Medicine, King Saud Bin Abdulaziz University for Health Sciences, Jeddah, SAU.
Cureus. 2023 Sep 11;15(9):e45043. doi: 10.7759/cureus.45043. eCollection 2023 Sep.
Background The application of artificial intelligence (AI) in education is undergoing rapid advancements, with models such as ChatGPT-4 showing potential in medical education. This study aims to evaluate the proficiency of ChatGPT-4 in answering Saudi Medical Licensing Exam (SMLE) questions. Methodology A dataset of 220 questions across four medical disciplines was used. The model was trained using a specific code to answer the questions accurately, and its performance was assessed using key performance indicators, difficulty level, and exam sections. Results ChatGPT-4 demonstrated an overall accuracy of 88.6%. It showed high proficiency with and questions, but accuracy decreased for questions. Performance was consistent across all disciplines, indicating a broad knowledge base. However, an error analysis revealed areas for further refinement, particularly with category (Option) A questions across all sections. Conclusions This study underscores the potential of ChatGPT-4 as an AI-assisted tool in medical education, demonstrating high proficiency in answering SMLE questions. Future research is recommended to expand the scope of training and evaluation as well as to enhance the model's performance on complex clinical questions.
背景 人工智能(AI)在教育领域的应用正在迅速发展,ChatGPT-4等模型在医学教育中显示出潜力。本研究旨在评估ChatGPT-4回答沙特医学执照考试(SMLE)问题的能力。方法 使用了一个包含四个医学学科的220个问题的数据集。该模型使用特定代码进行训练以准确回答问题,并使用关键绩效指标、难度级别和考试部分对其性能进行评估。结果 ChatGPT-4的总体准确率为88.6%。它在[具体类型1]和[具体类型2]问题上表现出很高的能力,但在[具体类型3]问题上准确率有所下降。在所有学科中的表现都很一致,表明具有广泛的知识基础。然而,错误分析揭示了需要进一步改进的领域,特别是在所有部分的类别(选项)A问题上。结论 本研究强调了ChatGPT-4作为医学教育中人工智能辅助工具的潜力,在回答SMLE问题方面表现出很高的能力。建议未来的研究扩大训练和评估的范围,并提高该模型在复杂临床问题上的性能。